Job details

AI Inference Engineer - Model Optimization & Deployment

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.

As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices.

In this role, you will:

Optimize large-scale models (LLMs, VLMs) using advanced quantization (PTQ, QAT), mixed-precision inference workflows, and parameter-efficient fine-tuning (LoRA, QLoRA).
Architect and implement model conversion and compilation pipelines using TensorRT and TensorRT-LLM for edge deployment.
Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks and compiled edge binaries.
Write and optimize custom CUDA kernels and TensorRT Plugins to maximize memory bandwidth and minimize latency on AI accelerators.
Write production-level, highly concurrent, and memory-safe C++ and Python code for real-time inference on vehicle SOCs.

Qualifications:

Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference workflows (INT8, FP8, INT4, BF16/FP16).
Proven experience optimizing large-scale models (LLMs, VLMs, or VLAs) utilizing KV-cache optimization (e.g., PagedAttention), Speculative Decoding, and Efficient Attention mechanisms (FlashAttention, Linear Attention).
Extensive experience with model conversion/compilation pipelines (TensorRT, TensorRT-LLM) and performing rigorous parity/latency benchmarking.
Proficiency in low-level programming for AI accelerators, specifically writing and optimizing custom CUDA kernels and TensorRT Plugins.
Production-level C++ (14/17/20) and Python programming skills, with experience writing concurrent, memory-safe, real-time inference code for edge devices.

Bonus Qualifications:

Experience with distributed training pipelines and model/tensor parallelism (PyTorch Distributed, Ray, DeepSpeed, Megatron-LM) and runtime efficiency optimization for GPU clusters.
Familiarity with autonomous driving perception stacks (temporal 3D object detection, BEV, 3D Occupancy Networks) and processing multi-modal sensor streams (Vision, LiDAR, Radar).
Understanding of end-to-end autonomous driving paradigms (VLA models, closed-loop simulation validation).

$242,000 - $290,000 a year

Base Salary Range

There are three major components to compensation for this position: salary, Amazon Restricted Stock Units (RSUs), and Zoox Stock Appreciation Rights. A sign-on bonus may be offered as part of the compensation package. The listed range applies only to the base salary. Compensation will vary based on geographic location and level. Leveling, as well as positioning within a level, is determined by a range of factors, including, but not limited to, a candidate's relevant years of experience, domain knowledge, and interview performance. The salary range listed in this posting is representative of the range of levels Zoox is considering for this position.

Zoox also offers a comprehensive package of benefits, including paid time off (e.g. sick leave, vacation, bereavement), unpaid time off, Zoox Stock Appreciation Rights, Amazon RSUs, health insurance, long-term care insurance, long-term and short-term disability insurance, and life insurance.

About Zoox

Zoox is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem required to bring this technology to market. Sitting at the intersection of robotics, machine learning, and design, Zoox aims to provide the next generation of mobility-as-a-service in urban environments. We’re looking for top talent that shares our passion and wants to be part of a fast-moving and highly execution-oriented team.

Accommodations

If you need an accommodation to participate in the application or interview process please reach out to accommodations@zoox.com or your assigned recruiter.

A Final Note:

You do not need to match every listed expectation to apply for this position. Here at Zoox, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.

AI Inference Model Optimization LLM VLM Quantization INT8 FP16 FP8 INT4 LoRA QLoRA TensorRT TensorRT-LLM CUDA Custom Kernels C++ Python Edge Inference Vehicle SOC FlashAttention Speculative Decoding KV-cache PagedAttention

Average salary estimate

$266000 / YEARLY (est.)

min

max

$242000K

$290000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Software Engineer - Core Sensors

Zoox Hybrid Foster City, CA

VIEW

Posted 19 hours ago

Zoox is hiring a skilled C++ software engineer to design and maintain high-performance, safety-critical drivers for lidar, radar, and camera sensors that feed the autonomous driving stack.

Senior Software Engineer, Data Platform

Lithic Hybrid Remote

VIEW

Posted 10 hours ago

Customer-Centric

Collaboration over Competition

Fast-Paced

Growth & Learning

Lithic seeks a Senior Software Engineer, Data Platform to build production Python backend services and REST APIs that reliably expose Snowflake-powered data to internal consumers.

C++ - Software Engineer, AI

G2i Inc. Hybrid Miami

VIEW

Posted 21 hours ago

Experienced C++ engineers are needed to evaluate, repair, and improve AI-generated code as contractor contributors to an RLHF pipeline.

Product Engineering Intern

Vendelux Hybrid No location specified

VIEW

Posted 7 hours ago

Work with Vendelux's Product Engineering team to build user-facing full-stack features and gain hands-on startup engineering experience in a backend-focused, remote-friendly internship.

Fullstack Engineer (Founder's Office)

Bioscope AI Hybrid Salt Lake City

VIEW

Posted 17 hours ago

Work directly with the founder to harden rapid AI-driven prototypes into battle-tested, frontend-forward foundations for an early-stage precision medicine platform.

Machine Learning Infrastructure Engineer, GenAI Technology

Point72 Hybrid United States

VIEW

Posted 9 hours ago

Point72 is hiring a Machine Learning Infrastructure Engineer to build and operate scalable GenAI infrastructure that accelerates model development and production across cloud and on-prem environments.

Software Engineer II

Alegeus Hybrid Milwaukee

VIEW

Posted 4 hours ago

Alegeus is hiring a Software Engineer II to design, develop, and maintain .NET-based SaaS applications that support fintech and healthtech solutions in a collaborative, hybrid environment.

Front-End Application Developer

Jobgether Hybrid US

VIEW

Posted 9 hours ago

Work remotely as a Front-End Application Developer building accessible, scalable React/Angular applications for environmental data platforms while contributing across the full stack.

Workload Porting & Performance Engineer

OpenAI Hybrid San Francisco

VIEW

Posted 11 hours ago

Inclusive & Diverse

Feedback Forward

Collaboration over Competition

Growth & Learning

Evaluate and optimize real-world AI workloads on emerging hardware platforms to bridge the gap between expected and observed system performance for OpenAI’s infrastructure.

Senior HPC Performance Engineer

NVIDIA Hybrid US, OR, Remote

VIEW

Posted 21 hours ago

Customer-Centric

Mission Driven

Inclusive & Diverse

Rise from Within

Diversity of Opinions

Work/Life Harmony

Growth & Learning

Transparent & Candid

Medical Insurance

Paid Time-Off

Maternity Leave

Mental Health Resources

Equity

Child Care stipend

Paternity Leave

WFH Reimbursements

Flex-Friendly

Dental Insurance

Vision Insurance

Life insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

401K Matching

Military leave

NVIDIA's NVHPC compilers & tools group seeks a Senior HPC Performance Engineer to analyze and optimize high-performance applications across CPU and GPU architectures and guide compiler and application engineering improvements.

Mgr, Software Engineering

Renesas Electronics Hybrid Austin, TEXAS

VIEW

Posted 10 hours ago

Lead and mentor a software engineering team at Renesas to deliver high-quality embedded and application software while driving execution and cross-functional collaboration.

Software Engineer II

Q-Centrix Hybrid Remote

VIEW

Posted 13 hours ago

Experienced backend-focused full-stack engineer to build and maintain scalable Ruby on Rails services integrated with React and GraphQL for a healthcare data intelligence platform.

Jr. Full stack .net developer (Charlotte, NC)

Cypress Global Services, Inc Hybrid NC-115, Charlotte, NC, USA

VIEW

Posted 19 hours ago

A growing IT services firm is hiring a Jr. Full Stack .NET Developer to implement .NET Core web applications, APIs, and database solutions in a collaborative team environment.

Senior Software Architect, AI Systems and Networking

NVIDIA Hybrid US, CA, Santa Clara

VIEW

Posted 3 hours ago

Customer-Centric

Mission Driven

Inclusive & Diverse

Rise from Within

Diversity of Opinions

Work/Life Harmony

Growth & Learning

Transparent & Candid

Medical Insurance

Paid Time-Off

Maternity Leave

Mental Health Resources

Equity

Child Care stipend

Paternity Leave

WFH Reimbursements

Flex-Friendly

Dental Insurance

Vision Insurance

Life insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

401K Matching

Military leave

Senior Architect role to design and implement high-performance AI communication and memory libraries while driving hardware-software co-optimization across GPUs, DPUs, NICs, and interconnects at NVIDIA.

Zoox

Zoox was founded to make personal transportation safer, cleaner, and more enjoyable—for everyone. To achieve that goal, the team created a whole new form of transportation. Zoox will provide mobility-as-a-service in dense urban environments.

31 jobs

MATCH

Calculating your matching score...

FUNDING

Other

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Senior Level

INDUSTRY

Computer Hardware Development

TEAM SIZE