Browse 17 exciting jobs hiring in Inference Optimization now. Check out companies hiring such as Hinge Health, Awesome Motive, webAI in Kansas City, Milwaukee, Honolulu.
Lead the measurement, experimentation, and data architecture for HingeSelect as the first dedicated Staff Product Data Scientist driving causal analysis, funnel optimization, and supply-demand modeling.
Help architect and operate the systems that take neuroscience datasets from raw experiments through large-scale model training, evaluation, and optimized production inference at Metamorphic.
Senior Machine Learning Engineer needed to transform prototype AI models into optimized, production-ready systems for secure, distributed public sector and edge deployments.
Lead performance and scalability improvements for LLM inference by optimizing runtime components, multi-GPU execution, and open-source serving frameworks at scale.
Drive production-ready model optimization, custom kernel development, and edge deployment to enable real-time inference of large-scale models on vehicle SOCs for Zoox's Perception team.
Lead system- and hardware-focused optimizations for LinkedIn’s AI inference platform, improving GPU utilization, compiler workflows, and low-latency model serving at scale.
Lead the design and delivery of a closed-loop intelligence layer that enables an autonomous trading fleet to learn from real-time outcomes and improve profitability.
Twelve Labs is hiring a senior Machine Learning Engineer to optimize and scale multimodal video foundation models for deployment across cloud and data platforms.
Lead the design and deployment of low-latency, production ML systems for voice, audio, and agentic control at an early-stage hardware and software startup in New York City.
Tavus is hiring a Multimodal AI Model Optimization Research Engineer to convert cutting-edge multimodal models into efficient, low-latency production systems.
Work across modeling, systems, and product to design, optimize, and ship production-grade AI systems for real-world users.
Lead the development of custom quantization algorithms and low-precision techniques to maximize model performance on Quadric's Chimera GPNPU from our Burlingame engineering office.
Decagon is hiring a Senior ML Infrastructure Engineer to design and scale distributed training and multi-provider inference platforms for LLMs and multimodal models.
Metamorphic is hiring an ML Research Engineer (Performance Engineering) to implement and optimize GPU kernels, low-precision training, and MoE systems for next-generation foundation models.
Wizard AI is hiring a Senior MLOps Engineer to own and scale the production ML lifecycle for a real-time inference platform behind a conversational shopping agent.
Varick seeks an AI Engineer to architect and ship production-grade agent systems, evaluation pipelines, and retrieval-driven context strategies for enterprise AI deployments.
Lead developer-facing content and sample projects that help ML engineers train, fine-tune, and deploy models on Dexmate humanoid robots while shipping production-quality code weekly.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
5
|