Browse 13 exciting jobs hiring in Agent Evaluation now. Check out companies hiring such as Preference Model, Artificial Intelligence Underwriting Company, Fieldguide in St. Louis, Greensboro, Shreveport.
Lead design and delivery of realistic, multi-step RL environments at an early-stage startup partnering with frontier AI labs to improve model robustness and training quality.
Design and ship production-grade evaluation infrastructure for cutting-edge AI agents while leading customer-facing certifications and shaping product strategy at AIUC.
Lead the design and implementation of evaluation infrastructure and observability for enterprise-grade AI agents powering audit and assurance workflows at Fieldguide's San Francisco office.
Help scale Chime's AI-powered Jade assistant by building platform tooling, backend services, and observability systems as a Senior Full-Stack Engineer.
Lead the architecture and productionization of Spotify’s shared Agent Engine to power scalable, reliable agent-based experiences across the platform.
Lead and build the agentic AI platform that enables pods of engineers and AI agents to safely and reliably deliver production software at scale.
Help build and deploy production AI agent platforms that power personalized financial advisory workflows for institutional clients at Arta.
A selective, eight-week (mostly virtual) unpaid bootcamp at ServiceNow for undergraduate students to learn agentic AI, build and evaluate agents, and present a capstone project during an in-person finale.
Senior engineering leader to design, evaluate and productionize agentic AI systems, prompt architectures and multi-agent orchestration for critical banking workflows at Deutsche Bank in Cary, NC.
Experienced software engineers with strong system-design and ML/LLM experience are needed to build and productionize LLM-powered agents, evaluation pipelines, and scalable AI infrastructure at Permute.
Fullscript is looking for a Staff Machine Learning Engineer to architect and ship production LLM-driven clinical features that improve clinician workflows and patient outcomes.
Work on TRM’s AI Engineering team to design and ship agentic LLM systems and scalable infrastructure that augment investigations and ensure safe, auditable behavior in high-sensitivity environments.
Varick seeks an AI Engineer to architect and ship production-grade agent systems, evaluation pipelines, and retrieval-driven context strategies for enterprise AI deployments.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
1
|