Evaluation Jobs

Posted 12 hours ago

Design and ship production-grade evaluation infrastructure for cutting-edge AI agents while leading customer-facing certifications and shaping product strategy at AIUC.

Senior Solutions Architect

Tech Firefly Hybrid No location specified

Posted 15 hours ago

Lead the technical architecture and cross-domain dependency mapping for a fast-paced, remote contract engagement supporting an academic medical center’s multi-year healthcare technology rollout.

Project Lion - Lead Prompt Engineer - United States (Remote, Part-Time)

Welo Global Hybrid United States

Posted 17 hours ago

Lead a U.S.-based team to migrate template systems to LLM autoraters and optimize model performance using advanced prompt engineering and evaluation methods.

*Scout Search Quality Rater - English (United States)

Welo Global Hybrid United States

Posted yesterday

Welo Data is seeking US-based English speakers to remotely evaluate and rate search results to improve search relevancy and AI performance.

Strategy & Business Operations Lead

Picogrid Hybrid El Segundo

Posted 2 days ago

Picogrid seeks a Strategy & Business Operations Lead to design and run the internal systems, metrics, and cross-functional programs that will let the company scale efficiently during rapid growth.

AI Engineer, Quality (Evals)

Fieldguide Hybrid San Francisco

Posted 2 days ago

Lead the design and implementation of evaluation infrastructure and observability for enterprise-grade AI agents powering audit and assurance workflows at Fieldguide's San Francisco office.

Applied AI Engineer

Awesome Motive Hybrid New York City

Posted 2 days ago

Apply state-of-the-art AI to financial workflows at Rowspace by building retrieval systems, agentic pipelines, and evaluation frameworks that turn unstructured data into actionable investment insights.

PhD AI Research Intern

Latitude Hybrid United States

Posted 2 days ago

Inclusive & Diverse

Rise from Within

Mission Driven

Diversity of Opinions

Work/Life Harmony

Passion for Exploration

Dare to be Different

Growth & Learning

Medical Insurance

Paid Time-Off

Maternity Leave

Equity

Learning & Development

Dental Insurance

Vision Insurance

Latitude seeks a PhD AI research intern to build a benchmark library and evaluate SOTA LLM behavior within our story engine, producing publishable results and a public report.

Engineer, Planning & Evaluation

Energy Trust of Oregon Hybrid No location specified

Posted 3 days ago

Energy Trust of Oregon seeks an Engineer, Planning & Evaluation to perform measure development, cost-benefit analyses, pilot design, and technical review to support cost-effective energy-efficiency programs.

Senior Software Engineer, GenAI Platform

Chime Financial, Inc Hybrid San Francisco, CA, USA

Posted 3 days ago

Help scale Chime's AI-powered Jade assistant by building platform tooling, backend services, and observability systems as a Senior Full-Stack Engineer.

SETA Test & Systems Engineering Advisor

KBR Hybrid Las Cruces, New Mexico

Posted 3 days ago

Experienced systems engineering and test & evaluation advisor needed to provide SETA support to the government for verification, test planning, execution, and evaluation of DoD systems.

Inpatient Occupational Therapist IHR-Casual As Needed

Northwestern Memorial Healthcare Hybrid 25 N. Winfield Rd., Winfield, IL

Posted 4 days ago

Northwestern Medicine is hiring a licensed Occupational Therapist (OTR/L) for per-diem inpatient care in Winfield, IL to provide evaluations, treatment, documentation, and interdisciplinary collaboration.

Wellness Associate

City of New York Hybrid New York City, NY

Posted 4 days ago

Support ACS’s Employee Wellness program by coordinating and delivering on-site wellness activities across NYC locations while tracking participation and reporting outcomes.

O

Senior Program Management Analyst

One Federal Solution Hybrid No location specified

Posted 4 days ago

Lead data-driven program performance analysis and provide actionable recommendations to support DoD and civilian federal programs as a Senior Program Management Analyst at One Federal Solution.

G

Machine Learning Engineer Intern - Research

Good At Numbers Hybrid No location specified

Posted 4 days ago

GoodAtNumbers is hiring a US-based remote Machine Learning Engineer Intern to push ML research into production by building, evaluating, and deploying reliable LLM-driven features during a paid 12-week summer internship.

t

Staff Software Engineer / Architect - AI, CoCounsel FDE

thomsonreuters Hybrid United States of America, Frisco, Texas

Posted 5 days ago

Lead the design and delivery of scalable, secure AI-native systems for sophisticated legal customers as a Staff Software Engineer / Architect on Thomson Reuters' CoCounsel FDE team.

s

Research Intern- AI Ethics

sonyglobal Hybrid Remote - California

Posted 5 days ago

Sony AI’s Research Ethics team is hiring a remote Research Intern to work on generative AI ethics, evaluation, and harm-mitigation research with opportunities for publication.

F

Site Lead - South Carolina

Foster America Hybrid No location specified

Posted 6 days ago

Serve as Foster America's South Carolina Site Lead to coordinate partners, drive implementation of the OPT-In initiative, and translate learning into sustained local impact for families.

Become a Luxury Brand Evaluator in Seattle/Bellevue, WA

CXG Hybrid No location specified

Posted 6 days ago

Evaluate luxury brand experiences in the Seattle/Bellevue area through short, flexible missions for CXG and help top brands improve service.

Senior Staff Machine Learning Engineer - Agentic Systems

Spotify Hybrid New York, NY

Posted 6 days ago

Inclusive & Diverse

Empathetic

Take Risks

Transparent & Candid

Feedback Forward

Mission Driven

Collaboration over Competition

Work/Life Harmony

Maternity Leave

Paternity Leave

Snacks

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

401K Matching

Paid Sick Days

Paid Time-Off

Paid Volunteer Time

Lead the architecture and productionization of Spotify’s shared Agent Engine to power scalable, reliable agent-based experiences across the platform.

Learning & Development Manager

National Vision Hybrid 2435 Commerce Ave NW, Duluth, GA 30096, USA

Posted 7 days ago

Lead the People Development team at National Vision to design and deliver scalable, measurable learning solutions for corporate, retail, manufacturing, and clinical associates.

Director of AI Engineering

Cover Whale Hybrid No location specified

Posted 7 days ago

Lead and build the agentic AI platform that enables pods of engineers and AI agents to safely and reliably deliver production software at scale.

L

AI Engineer

LanguageWire Hybrid No location specified

Posted 7 days ago

LanguageWire is hiring an AI Engineer to design and productionize LLM-based translation workflows and bridge ML experimentation with production engineering.

Become a Luxury Brand Evaluator in Indianapolis - Apply Now

CXG Hybrid No location specified

Posted 8 days ago

Evaluate luxury brand experiences for CXG through flexible in-store or online missions that provide actionable feedback to premium brands.

E

AI Engineer

EQL Tech Hybrid No location specified

Posted 8 days ago

Work on a mission-driven fintech team to build and ship core AI products (LLM/VLM and evaluation pipelines) that power eligibility and compliance for education savings accounts.

I

Software Engineer II, Machine Learning Systems & Productization

Iambic Therapeutics, Inc Hybrid San Diego

Posted 8 days ago

Iambic Therapeutics seeks a Software Engineer II to co-develop and harden ML training, evaluation, and productization workflows that enable AI-driven drug discovery.

Engineering Manager, Applied AI

Mercor Hybrid No location specified

Posted 8 days ago

Lead and grow an Applied AI engineering team at Mercor to build scalable evaluation and data systems that measurably improve frontier model performance.

Application Engineering Intern

Renesas Electronics Hybrid Palm Bay, Florida

Posted 8 days ago

Application Engineering Intern at Renesas Hi-Rel to perform lab-based evaluations of power/ADC products, produce technical analysis, and present findings.

English (United States) > Japanese (Japan) Lyric Translation Reviewer

Welo Global Hybrid No location specified

Posted 9 days ago

Evaluate machine-translated English (US) to Japanese (Japan) song lyrics for meaning, fluency, and cultural accuracy on a flexible, remote freelance project with Welo Data.

Flight Test Integrations and Operations Manager - Mission Autonomy

Anduril Industries Hybrid San Clemente, California, United States

Posted 9 days ago

Anduril seeks an experienced manager to lead flight test integration and operations for UAS platforms, overseeing system integration, mesh networking, and Flight Test Operations as an RPIC.

Sr. NDE Engineer, Radiography Testing

SpaceX Hybrid Hawthorne, CA

Posted 9 days ago

Mission Driven

Social Impact Driven

Passion for Exploration

Reward & Recognition

Senior NDE Engineer (Radiography Testing) to design, prototype, and deploy advanced radiography and automated inspection solutions to improve manufacturing quality and flight reliability at SpaceX.

k

AI Product Engineer, Clinical Tools

knownwell Hybrid Remote

Posted 9 days ago

Lead the product vision and engineering for clinician-facing AI tools at knownwell, building and operating RAG-based clinical decision support with full product ownership and direct clinician partnership.

Senior AI Technical Product Manager - R01563914

Brillio Hybrid New York, New York, United States

Posted 9 days ago

Experienced technical product leader needed to own prioritization, quality, and stakeholder alignment for LLM-driven products while staying hands-on with architecture, code reviews, and AI cost optimization.

Machine Learning Engineer, AI Agent Platform

Arta Finance Hybrid Mountain View

Posted 10 days ago

Help build and deploy production AI agent platforms that power personalized financial advisory workflows for institutional clients at Arta.

Maps Personalization Relevance Rater - English (US)

Welo Global Hybrid United States

Posted 11 days ago

Contract freelance raters in the United States will evaluate personalized map and search recommendations using their Google Maps activity history and follow project guidelines to rate relevance and usefulness.

Shape the Future of AI — English Talent Hub

Welo Global Hybrid No location specified

Posted 11 days ago

Welo Data is building a flexible, remote contributor network of native English speakers to annotate, evaluate, and create prompts that improve AI systems.

Community Outreach Specialist - Pediatrics - Part time

Carilion Clinic Hybrid Roanoke, Virginia

Posted 12 days ago

Carilion Clinic is hiring a part-time Community Outreach Specialist to deliver evidence-based pediatric health education and support community partnerships across the Roanoke area.

English (United States) > German (Germany) Lyric Translation Reviewer

Welo Global Hybrid No location specified

Posted 12 days ago

Evaluate machine-translated English (US) to German (Germany) song lyrics for accuracy, fluency, and cultural appropriateness in a remote freelance role.

VP, Product (AI & Search) - Slack

Salesforce Hybrid California - San Francisco

Posted 12 days ago

Inclusive & Diverse

Rise from Within

Mission Driven

Diversity of Opinions

Work/Life Harmony

Feedback Forward

Take Risks

Collaboration over Competition

Medical Insurance

Dental Insurance

Vision Insurance

Paid Time-Off

Maternity Leave

Paternity Leave

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Employee Resource Groups

Lead Slack's search and AI platform as VP Product to set strategy, drive model and infrastructure decisions, and deliver reliable, scalable AI-powered search and knowledge services for enterprise users.

Senior Director, Search & Evaluation, Neuroscience

AbbVie Hybrid North Chicago, IL

Posted 12 days ago

Lead AbbVie's Neurosciences Search & Evaluation team to identify, assess, and advance high-value external partnering opportunities that strengthen the company’s neuroscience pipeline and strategic goals.

Forward Deployed Engineer

NICE Hybrid USA - Remote

Posted 12 days ago

NiCE is hiring a Forward Deployed Engineer to design, ship, and operate production-scale conversational AI agents that solve high-impact enterprise problems.

PSYCHOLOGIST CREDENTIALED

Montefiore Hybrid 2532 Grand Concourse

Posted 12 days ago

Montefiore is hiring a licensed Psychologist (PhD/PsyD) to conduct disability-related psychological assessments and clinical consultations for participants in the WeCARE employment-focused program.

V

AI Writing Evaluators (Domain Experts) - English Expertise

Volga Partners Hybrid No location specified

Posted 12 days ago

Experienced domain experts in Business Operations & Communications or Education and Academic Research are needed for a remote, retainer-based 2‑week role evaluating and crafting prompts for AI writing models with US-contextual standards.

A

Founding Forward Deployed Engineer

Artificial Intelligence Underwriting Company Hybrid San Francisco

Posted 12 days ago

Join an early-stage AI safety startup as a founding Forward Deployed Engineer to design rigorous AI evals, lead customer implementations, and shape product strategy for certification of real-world AI agents.

Become a Luxury Brand Evaluator in Hawaii - Apply Now

CXG Hybrid No location specified

Posted 12 days ago

Work as a freelance luxury brand evaluator for CXG, discreetly assessing boutique and online experiences to help premium brands refine their service.

t

Technical Advisor: Mental Health & Psychosocial Support - Occasional

theirc Hybrid New York, NY HQ USA

Posted 12 days ago

Serve as the MHPSS Technical Advisor for IRC RAI, providing evidence-based guidance, training, and partnership support to improve mental health and psychosocial services for forcibly displaced populations in the U.S.

Manager, Evaluation Faculty, School of Technology - Electrical and Computer Engineering

WGU Hybrid United States

Posted 12 days ago

Lead and develop a remote evaluation team in WGU’s School of Technology to ensure accurate, scalable competency-based assessment and continuous improvement for Electrical and Computer Engineering programs.

E

(Senior) Researcher

Epoch AI Hybrid Remote

Posted 13 days ago

Epoch AI is hiring remote Researchers and Senior Researchers to conduct data-driven investigations, build benchmarks, and forecast AI capabilities and trends.

Product Analyst - Generative AI Platform

Visa Hybrid Austin, TX