Job details

Model Serving Engineer

About Fundamental

Fundamental is an AI company pioneering the future of enterprise decision-making. Founded by DeepMind alumni, Fundamental has developed NEXUS – the world's most powerful Large Tabular Model (LTM) – purpose-built for the structured records that actually drive enterprise decisions. Backed by world class investors and trusted by Fortune 100 companies, Fundamental unlocks trillions of dollars of value by giving businesses the Power to Predict.

At Fundamental, you'll work on unprecedented technical challenges in foundation model development and build technology that transforms how the world's largest companies make decisions. This is your opportunity to be part of a category-defining company from the ground-up. Join the team defining the future of enterprise AI.

About the role

We are looking for a Model Serving Engineer to own the production inference layer for NEXUS, our Large Tabular Model. You will be responsible for serving models reliably and efficiently at scale, working primarily with Triton Inference Server and building the infrastructure that brings our research directly to customers. This is a deeply technical, Python-heavy role that sits at the intersection of systems engineering and applied ML.

You will work closely with our research and engineering teams to translate model outputs into production-grade inference pipelines that meet strict latency and throughput requirements.

Key responsibilities

Design, build, and maintain production model serving infrastructure using Triton Inference Server as the primary framework
Implement and optimize inference pipelines including custom backends, dynamic batching strategies, and model ensemble configurations in Triton
Optimize Python inference code for performance, with a strong focus on GIL contention, multi-threading, and concurrency patterns
Tune throughput and latency across the full serving stack, batching policies, thread pool sizing, model instance groups, and memory layout
Work closely with the research team to understand new model architectures at a computational level, batching behavior, dynamic shapes, memory access patterns etc
Own the full resource observability and control loop for production inference - instrument GPU memory, CPU, batch queue depth, and latency metrics, and actively tune model instance groups, concurrency limits, memory budgets, and batching configuration in response to observed behavior
Evaluate and integrate alternative inference frameworks and runtimes as the model ecosystem evolves
Contribute to GPU utilization improvements and resource efficiency across the serving fleet

Must have

Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent practical experience)
5+ years of experience in model serving, ML infrastructure, or a closely related backend engineering role
Deep, production-level experience with Triton Inference Server, including custom Python backends, batching configuration, and model repository management
Expert-level Python skills with a thorough understanding of the GIL, multi-threading, multiprocessing, and async concurrency patterns
Strong understanding of neural network inference mechanics, forward passes, batching strategies, memory management, and numerical precision tradeoffs
Hands-on experience with other inference frameworks (TorchServe, TensorFlow Serving, ONNX Runtime, vLLM, etc.) and the ability to evaluate tradeoffs between them
Experience profiling and optimizing inference code for latency and throughput at production scale

Nice to have

Experience with GPU kernel-level optimizations or CUDA profiling tools
Familiarity with model quantization, pruning, or compilation toolchains (TensorRT, torch.compile, ONNX)
Experience with KServe or other Kubernetes-native serving platforms
Experience serving tabular or structured data models, including classical ML models such as XGBoost and CatBoost
Experience with observability tooling such as Prometheus, Grafana, or Datadog in the context of inference monitoring

Benefits

Competitive compensation with salary and equity
Comprehensive health coverage, including medical, dental, vision, and 401K
Paid parental leave for all new parents, inclusive of adoptive and surrogate journeys
Relocation support for employees moving to join the team in one of our office locations
A mission-driven, low-ego culture that values diversity of thought, ownership, and bias toward action

Model Serving Triton Python GIL Concurrency Inference GPU CUDA TensorRT ONNX vLLM Kubernetes Prometheus XGBoost ML Infrastructure

Average salary estimate

$200000 / YEARLY (est.)

min

max

$160000K

$240000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Film Technology AR/VR Internships – Academic Year

NBCUniversal Hybrid 100 Universal City Plaza, Universal City, CA 91608, USA

VIEW

Posted 16 hours ago

Academic Year internship at NBCUniversal's Universal Pictures Content Group focused on full-stack and AR/VR development, machine learning experimentation, and digital transformation projects.

Senior Software Engineer, Identity & Access

Patreon Hybrid No location specified

VIEW

Posted 10 hours ago

Inclusive & Diverse

Transparent & Candid

Growth & Learning

Diversity of Opinions

Mission Driven

Customer-Centric

Rapid Growth

Dare to be Different

Collaboration over Competition

Work on Patreon's Identity & Access team to design and implement authentication, verification, and anti-account-takeover systems that protect creators and fans while delivering a great user experience.

IT Software Development Specialist Sr

Sedgwick Hybrid Memphis, TN

VIEW

Posted 14 hours ago

Experienced software developer sought to build and maintain claims and insurance applications using PL/SQL, Oracle, Progress 4GL, .NET and SQL Server for Sedgwick’s Memphis team.

Software Engineer II

Alegeus Hybrid Milwaukee

VIEW

Posted 5 hours ago

Alegeus is hiring a Software Engineer II to design, develop, and maintain .NET-based SaaS applications that support fintech and healthtech solutions in a collaborative, hybrid environment.

NBC News Technology Internships – Academic Year

NBCUniversal Hybrid 30 Rockefeller Plaza, New York, NY 10111, USA

VIEW

Posted 16 hours ago

NBC News is hiring Academic Year interns in New York across product, design, data/graphics, mobile development, and software engineering to contribute to real projects while earning $30/hour.

Full Stack Developer (Pipeline)

InterImage Hybrid No location specified

VIEW

Posted 8 hours ago

Experienced Full Stack Developer needed to maintain and enhance WEBCANDID and TESTFLIGHT reporting tools, including on-call support for mission-critical operations.

GTM Engineer, Marketing

Ironclad Hybrid San Francisco

VIEW

Posted 18 hours ago

Ironclad is hiring an AI-native GTM Engineer to architect and deploy autonomous agent systems and integrations that automate end-to-end marketing workflows and drive measurable revenue impact.

Senior Software Engineer, Applications

Astronomer Hybrid New York

VIEW

Posted 1 hour ago

Build and own backend services, APIs, and customer-facing features for Astro Private Cloud to provision, configure, and operate Airflow environments at scale.

Senior Angular/Full-Stack Software Engineer, Wellfit Plans

Wellfit Technologies Hybrid Irving, TX

VIEW

Posted 19 hours ago

Senior Angular/Full-Stack Engineer to drive front-end architecture and build provider-facing treatment planning and eligibility UIs at Wellfit, working across Product, Design, and backend teams.

Front-End Application Developer

Jobgether Hybrid US

VIEW

Posted 11 hours ago

Work remotely as a Front-End Application Developer building accessible, scalable React/Angular applications for environmental data platforms while contributing across the full stack.

Principal Application Modernization Engineer

Liatrio Hybrid Remote

VIEW

Posted 20 hours ago

Liatrio is hiring a Principal Application Modernization Engineer to lead architectural direction, deliver complex modernization workstreams, and integrate AI capabilities into enterprise applications.

Staff Software Engineer

HubSpot Hybrid Remote - USA

VIEW

Posted 23 hours ago

Mission Driven

Customer-Centric

Transparent & Candid

Growth & Learning

Fast-Paced

Inclusive & Diverse

Work/Life Harmony

Rise from Within

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Education Stipend

Learning & Development

Bias Training

Performance Bonus

Staff Software Engineer to build and scale AI-native full-stack products at HubSpot Foundry, shipping rapid prototypes and production-ready features that help SMBs grow.

Software Engineer Lead & Architect

Accenture Federal Services Hybrid Arlington, VA

VIEW

Posted 12 hours ago

Lead architecture and engineering efforts to design, build, and deliver scalable, containerized applications using Golang, JavaScript, and Python for mission-driven federal clients.

Fundamental

At Fundamental, when one person cares for another, it’s more than a profession. It’s a relationship. One based on experience as well as empathy. It is a blending of high-tech with high-touch. It is a relationship nourished by understanding. We see...

2 jobs

MATCH

Calculating your matching score...

FUNDING

Series A

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Senior Level

INDUSTRY

Health Care Services & Hospitals

TEAM SIZE