Browse 30 exciting jobs hiring in Ml Inference now. Check out companies hiring such as NVIDIA, Awesome Motive, Zoox in Worcester, St. Petersburg, Laredo.
Senior Architect role to design and implement high-performance AI communication and memory libraries while driving hardware-software co-optimization across GPUs, DPUs, NICs, and interconnects at NVIDIA.
Help architect and operate the systems that take neuroscience datasets from raw experiments through large-scale model training, evaluation, and optimized production inference at Metamorphic.
Lead and grow Zoox's ML Platform engineering organization to deliver scalable training and low-latency inference infrastructure for large foundation and RL models across vehicle and cloud environments.
Develop and productionize agent systems and the Friendli Agent API at FriendliAI to enable developers to build reliable, high-impact AI agent applications.
Senior Machine Learning Engineer needed to transform prototype AI models into optimized, production-ready systems for secure, distributed public sector and edge deployments.
Lead system- and hardware-focused optimizations for LinkedIn’s AI inference platform, improving GPU utilization, compiler workflows, and low-latency model serving at scale.
Lead DeepWalk’s computer vision platform as a Staff Software Engineer, driving the architecture and productionization of ML systems that process millions of images for sidewalk inspection and city infrastructure decisions.
Lead the design and delivery of a closed-loop intelligence layer that enables an autonomous trading fleet to learn from real-time outcomes and improve profitability.
Help scale production ML infrastructure and retrieval systems at Foxglove to enable high-performance semantic search and data mining over multimodal robotics data.
Deepgram is hiring an ML Ops Infrastructure Engineer to design and operate scalable model deployment, CI/CD, and monitoring systems that deliver production-grade voice AI at scale.
Lead the design and deployment of low-latency, production ML systems for voice, audio, and agentic control at an early-stage hardware and software startup in New York City.
Work with research teams to productionize large-scale generative models, build GPU inference infrastructure, and ensure reliable deployment and observability for production ML workloads.
Work across modeling, systems, and product to design, optimize, and ship production-grade AI systems for real-world users.
A Research Engineer role focused on GPU/kernel and distributed-training optimizations to scale and accelerate real-time world-model AI.
Lead and build True Anomaly’s AI platform and engineering team to deliver production-grade model hosting, agent infrastructure, and enterprise AI tooling that embed AI across the company.
Fundamental is hiring a Model Serving Engineer to build and optimize production inference infrastructure for NEXUS, focusing on Triton-based pipelines, GPU efficiency, and low-latency, high-throughput serving.
Lead ML-driven improvements to ad auction performance by building scalable models, running experiments, and partnering with engineering and product teams at a fast-paced ad tech organization.
Lead the design and productionization of mission-critical NLP and LLM-powered features at Laurel, shaping the AI platform that returns time to professional services firms.
Lead the Core GenerativeAgent team to design, build, and deploy low-latency, enterprise-grade conversational voice AI combining LLMs with speech-to-text, text-to-speech, and real-time streaming pipelines.
Amazon Security seeks a Senior Security Engineer to lead offensive operations and research against AI systems, scaling automated threat emulation across the AI portfolio.
Senior technical role focused on researching, engineering, and scaling privacy-preserving ML and LLM alignment solutions across LinkedIn's platforms.
Decagon is hiring a Senior ML Infrastructure Engineer to design and scale distributed training and multi-provider inference platforms for LLMs and multimodal models.
Metamorphic is hiring an ML Research Engineer (Performance Engineering) to implement and optimize GPU kernels, low-precision training, and MoE systems for next-generation foundation models.
Work on training and deploying large-scale ML systems for physical robots while building the infrastructure and pipelines to operate them in production.
Wizard AI is hiring a Senior MLOps Engineer to own and scale the production ML lifecycle for a real-time inference platform behind a conversational shopping agent.
Andromeda Cluster is hiring an Infrastructure Manager to scale global GPU compute supply and demand matching by sourcing suppliers, optimizing utilization, and negotiating commercial terms.
NVIDIA seeks a seasoned Developer Relations Manager to partner with hyperscaler AI teams, provide hands-on technical enablement for NVIDIA AI software, and drive developer adoption and feedback into the product roadmap.
Lead developer-facing content and sample projects that help ML engineers train, fine-tune, and deploy models on Dexmate humanoid robots while shipping production-quality code weekly.
Help shape Baseten's model ecosystem by combining hands-on engineering, developer education, and product thinking to improve model discovery, evaluation, and adoption.
Senior Staff AI Engineer to lead research and productionization of privacy-preserving ML (differential privacy, federated learning, secure computation) and LLM alignment across LinkedIn’s AI platforms.
Below 50k*
0
|
50k-100k*
0
|
Over 100k*
2
|