Job details

Principal Platform Engineer

About

Edison Scientific builds and commercializes AI agents for science. Scientific discovery moves too slowly, and autonomous AI agents are how we intend to fix that. We're assembling a team of top researchers and engineers across AI and biology to build an AI scientist.

Role

As a Principal Platform Engineer, you'll play a key role in designing, scaling, and operating the core platform infrastructure that powers autonomous scientific discovery. Your primary focus will be the orchestration for our agents at scale — building and managing clusters that orchestrate thousands of persistent, stateful workloads, developing custom resource definitions (CRDs) and operators, and ensuring the reliability and efficiency of our compute layer at scale.

Our mission is to build an AI scientist, and you'll own the infrastructure foundation it runs on. AI agents performing long-running scientific research demand resilient scheduling, lifecycle management, and resource orchestration far beyond typical cloud-native workloads. This role will influence platform architecture, establish infrastructure best practices, and partner closely with backend engineers, ML engineers, and researchers to deliver a production-grade environment that lets science move faster.

At Edison Scientific, engineering at the senior level is about technical ownership and leverage- understanding how complex systems interact, making sound architectural tradeoffs, and building foundations that allow teams and science to move faster.

This role is on-site at our San Francisco office in the Dogpatch neighborhood. Our office is a converted warehouse with high ceilings, open space, and a team that genuinely believes in what they're building.

This position is part of the Platform team.

Responsibilities

Architect, implement, and operate Kubernetes clusters that support thousands of concurrent, persistent resources (agents, jobs, services) with high availability and efficient resource utilization.
Design and develop custom resource definitions (CRDs) and Kubernetes operators to model and manage domain-specific workloads such as AI agent lifecycles, research pipelines, and long-running compute tasks.
Drive the strategy for cluster scaling, node pool management, autoscaling policies, and resource quota frameworks to handle rapid workload growth.
Build and maintain infrastructure-as-code (Terraform, Pulumi, or similar) for reproducible, version-controlled environment management.
Design and implement robust scheduling, placement, and affinity strategies to optimize cost, performance, and fault tolerance for heterogeneous workloads (CPU, GPU, memory-intensive).
Establish and uphold best practices around observability, monitoring, alerting, and incident response for infrastructure systems (Prometheus, Grafana, Datadog, or similar).
Own storage and networking strategy within Kubernetes — including persistent volume management, CSI drivers, service mesh, network policies, and ingress architecture.
Troubleshoot complex, cross-system infrastructure issues and guide others through effective debugging and remediation in distributed environments.
Collaborate closely with backend, ML, and research teams to understand workload requirements and translate them into reliable infrastructure patterns.

Qualifications

5+ years of professional infrastructure or platform engineering experience, with deep hands-on Kubernetes expertise in production environments.
Experience designing and implementing custom resource definitions (CRDs) and Kubernetes operators (using frameworks such as Kubebuilder, Operator SDK, or controller-runtime).
Track record of operating and scaling Kubernetes clusters supporting thousands of persistent or long-lived resources (stateful workloads, persistent pods, long-running jobs).
Deep understanding of Kubernetes internals — API server, etcd, scheduler, controller manager, kubelet — and how they behave at scale.
Expertise with cloud infrastructure (AWS EKS, GCP GKE, or Azure AKS) and associated networking, storage, and IAM primitives.
Proficiency in at least one systems or backend language for operator development and infrastructure tooling.
Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, or Crossplane) and GitOps workflows.
Strong working knowledge of container networking (CNI plugins, service mesh, network policies), storage (CSI, persistent volumes, StatefulSets), and security (RBAC, Pod Security Standards, secrets management).
Ability to operate autonomously, make sound technical judgments, and drive projects from concept through production.

Bonus points for:

Experience with data-intensive platforms, scientific computing, or ML/AI infrastructure.
Prior experience in startups or small teams with significant architectural ownership and ambiguity.
Experience scaling systems, teams, or platforms through periods of rapid growth.

Salary

$200,000 - $350,000 • Offers equity

Why join us?

Competitive salary and equity
Full healthcare coverage — we pay 100% of premiums for you and your dependents
Support for growing families, including a yearly new parent stipend and fertility coverage through Carrot
401(k) company matching
$300 health and wellness benefit
Lunch is on us every day you're in the office, and dinner is on us when you're working late
Regular team offsites and company events
A fast-moving, mission-driven culture where smart people do their best work and actually enjoy doing it

Principal Platform Engineer Kubernetes CRD Operator EKS GKE Terraform Pulumi GitOps Go Python Container Platform DevOps ML infrastructure Stateful workloads Observability

Average salary estimate

$275000 / YEARLY (est.)

min

max

$200000K

$350000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Software Engineer, Full-Stack

Superhuman Hybrid San Francisco

VIEW

Posted 19 hours ago

Superhuman seeks a Full-Stack Software Engineer to deliver scalable back-end services and rich front-end experiences as part of a hybrid engineering team empowering millions of users.

C++ - Software Engineer, AI

G2i Inc. Hybrid Miami

VIEW

Posted 23 hours ago

Experienced C++ engineers are needed to evaluate, repair, and improve AI-generated code as contractor contributors to an RLHF pipeline.

Film Technology AR/VR Internships – Academic Year

NBCUniversal Hybrid 100 Universal City Plaza, Universal City, CA 91608, USA

VIEW

Posted 16 hours ago

Academic Year internship at NBCUniversal's Universal Pictures Content Group focused on full-stack and AR/VR development, machine learning experimentation, and digital transformation projects.

Senior Angular/Full-Stack Software Engineer, Wellfit Plans

Wellfit Technologies Hybrid Irving, TX

VIEW

Posted 19 hours ago

Senior Angular/Full-Stack Engineer to drive front-end architecture and build provider-facing treatment planning and eligibility UIs at Wellfit, working across Product, Design, and backend teams.

Workload Porting & Performance Engineer

OpenAI Hybrid San Francisco

VIEW

Posted 12 hours ago

Inclusive & Diverse

Feedback Forward

Collaboration over Competition

Growth & Learning

Evaluate and optimize real-world AI workloads on emerging hardware platforms to bridge the gap between expected and observed system performance for OpenAI’s infrastructure.

Site Reliability Engineer

Clarity Innovations Hybrid Remote

VIEW

Posted 10 hours ago

Experienced Site Reliability Engineer needed to lead observability, automation, and data-focused reliability efforts for cloud-based national security systems in a collaborative, mission-driven environment.

Network Automation Engineer II

BlueAlly Hybrid Remote, OR, USA

VIEW

Posted 13 hours ago

Experienced network automation engineer needed to build and maintain Python automation, NetBox integrations, and multi-vendor networking workflows for a client-facing engineering team.

Mid-level Frontend Super SWE

P-1 AI Hybrid United States

VIEW

Posted 19 hours ago

Help design and implement the UI and interaction layer between engineers and Archie, shaping workflows and real-time systems that make AI a practical engineering teammate.

Mgr, Software Engineering

Renesas Electronics Hybrid Austin, TEXAS

VIEW

Posted 12 hours ago

Lead and mentor a software engineering team at Renesas to deliver high-quality embedded and application software while driving execution and cross-functional collaboration.

Senior Software Engineer, GenAI Platform

Chime Financial, Inc Hybrid San Francisco, CA, USA

VIEW

Posted 19 hours ago

Help scale Chime's AI-powered Jade assistant by building platform tooling, backend services, and observability systems as a Senior Full-Stack Engineer.

AI Developer Experience Engineer

Polygon Labs Hybrid No location specified

VIEW

Posted 23 hours ago

Polygon Labs seeks an AI Developer Experience Engineer to build org-wide AI tooling, agent integrations, and observability that speed AI adoption across a distributed blockchain-focused company.

Senior Frontend Engineer

Graphite Careers Hybrid New York

VIEW

Posted 24 hours ago

Graphite is seeking a Senior Frontend Engineer to lead the frontend architecture and help build a real-time, collaborative code review platform that accelerates developer velocity.

Staff Software Engineer

HubSpot Hybrid Remote - USA

VIEW

Posted 23 hours ago

Mission Driven

Customer-Centric

Transparent & Candid

Growth & Learning

Fast-Paced

Inclusive & Diverse

Work/Life Harmony

Rise from Within

Medical Insurance

Dental Insurance

Vision Insurance

Mental Health Resources

Life insurance

Disability Insurance

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Education Stipend

Learning & Development

Bias Training

Performance Bonus

Staff Software Engineer to build and scale AI-native full-stack products at HubSpot Foundry, shipping rapid prototypes and production-ready features that help SMBs grow.

E Edison Scientific

5 jobs

MATCH

Calculating your matching score...

FUNDING

Early

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Director / Expert

TEAM SIZE

No info