Browse 61 exciting jobs hiring in Site Reliability now. Check out companies hiring such as Clarity Innovations, theocc, MongoDB in Bakersfield, Norfolk, Columbus.
Experienced Site Reliability Engineer needed to lead observability, automation, and data-focused reliability efforts for cloud-based national security systems in a collaborative, mission-driven environment.
An experienced SRE/DevOps professional is needed to architect automation, observability, and runbooks for OCC's critical clearing platform while mentoring teammates and improving reliability.
Experienced SRE with a strong infrastructure background wanted to help operate, automate, and scale MongoDB Atlas across multi-cloud environments.
NVIDIA is hiring a Senior Staff Software Engineer to design agentic AI automation and build integrations to transform enterprise IT operations and prevent problems at scale.
Lead the design and scaling of enterprise-grade, reliable cloud platforms as an SRE Architect working with cross-functional teams in a hybrid Austin, TX environment.
Lead the design and operation of Axle Health's secure, scalable AWS infrastructure and CI/CD pipelines to support enterprise-grade, HIPAA-compliant in-home healthcare software.
The University of Chicago's CTDS is hiring a Senior Platform Engineer to lead production support, CI/CD pipelines, monitoring, and security automation across hybrid cloud and on‑prem translational data science platforms.
Trimble is seeking a Site Reliability Engineer to strengthen and scale Vista Cloud infrastructure for enterprise AECO customers by delivering automation, robust monitoring, and deep technical support.
Be the engineer who designs and operates large-scale Linux infrastructure, CI/CD pipelines, and automation to power Intel's architecture modeling and simulation workflows.
Ro is hiring a Senior Site Reliability Engineer to strengthen and scale our AWS-based infrastructure, improve uptime and MTTR, and help embed reliability practices across the engineering organization.
Hudu is hiring an experienced DevOps Engineer to operate and optimize its Rails-based SaaS infrastructure on AWS and Kubernetes, focusing on reliability, security, and performance.
SpaceX is hiring a Site Reliability Engineer to build and operate mission-critical application infrastructure that accelerates and secures vehicle and satellite software delivery.
Visa is hiring a Software Development Engineer on the Product Reliability Engineering team to build scalable automation, database platform tooling, and GenAI-powered reliability solutions for global payment infrastructure.
Nabla seeks a senior SRE/Backend engineer to drive platform reliability and scalability for its clinical AI systems supporting clinicians across the US and EU.
Lead PlayStation's Service Reliability Engineering team to own global uptime, stability, and operational excellence for FTG's cloud gaming infrastructure.
Hammerhead is hiring a Site Reliability Engineer to establish and run the reliability function for an AI-driven power orchestration platform deployed across cloud and on-prem data centers.
Lead a distributed SRE team at LexisNexis Risk Solutions to design and operate secure, automated, cloud-native infrastructure and drive on‑prem-to‑cloud migrations using Terraform, Azure, and modern CI/CD patterns.
Homebot seeks a Senior DevOps Engineer to lead multi-cloud (AWS and GCP) infrastructure design, operation, and developer enablement for our platform.
Stitch Fix is hiring a Platform Engineer to enhance cloud-native infrastructure, developer tooling, and CI/CD workflows to improve developer experience across the company.
Lead the architecture and operation of production-scale GPU clusters at Andromeda, partnering with customers to maximize distributed training reliability and performance.
Anduril's Discovery team is hiring a Site Reliability Engineer to design and operate scalable, secure deployments that integrate cloud, robotics, and mesh networking for mission-critical systems.
Kochava is hiring a Senior Site Reliability Engineer to develop and operate scalable, highly available infrastructure and tooling across cloud and on-prem environments.
Anduril's Discovery team is hiring a DevOps Software Engineer to design and operate CI/CD, IaC, containerized deployments, and MLOps pipelines for high-impact autonomy and networking systems.
Experienced SRE/DBA skilled in SQL Server, system administration, and cloud operations to ensure high-availability and performance of Intelerad's medical imaging platforms.
ServiceNow seeks a Staff Site Reliability Engineer to drive performance troubleshooting, incident escalation, and availability improvements across its cloud platform while working directly with customers and engineering teams.
HomeVision is hiring an Associate Site Reliability Engineer to help scale its AWS/Terraform platform, improve reliability and observability, and support IT and product initiatives in a fully remote environment.
Lead site reliability and platform engineering efforts at WGU as a Senior Software Engineer, building scalable, cloud-aware systems that power the university's online learning platform.
Crusoe is hiring a Software Engineer to help design and scale highly available distributed systems and build platform tools that power sustainable AI infrastructure.
Medtronic is hiring a Principal Software Cloud Engineer to architect and implement cloud-native microservices for CRM Software at its Minneapolis site.
Workday Government is hiring an SRE-focused software engineer to operate, troubleshoot, and harden large-scale cloud services for U.S. federal customers, requiring U.S. citizenship and clearance eligibility.
Lead and grow an engineering team building scalable, secure enterprise infrastructure and backend systems for LinkedIn’s Mountain View hybrid environment.
IonQ is hiring a Senior Manager, Software Engineering to lead and grow the System Operations Software team responsible for building scalable, reliable software for quantum systems.
Lead ServiceNow CMDB and ETL engineering efforts at Visa to design, build, and operate reliable discovery, ingestion, and data pipelines supporting enterprise CMDB and ITOM capabilities.
Lead observability and SRE efforts for high-availability government digital services at Mighty Acorn, building monitoring, incident response practices, and mentoring engineers across an AWS-based stack.
Lead Site Reliability Engineer needed to own SLO-driven reliability, Infrastructure as Code, and observability for athenahealth's hybrid cloud infrastructure while mentoring SRE teams.
Weedmaps is hiring a remote Site Reliability Engineer to strengthen observability, CI/CD, and containerized production reliability for its cloud-native services.
Senior Technology Director needed to lead cloud, DevOps, and platform modernization initiatives for a large member-facing organization, driving strategy, engineering leadership, and secure scalable delivery.
Lead the design and automation of cloud-native AWS infrastructure and AI-assisted operational tooling to drive resilience and efficiency across DraftKings' real-time platform.
Lead the Consumer Lending domain's SRE efforts at Toyota Financial Services to drive observability, automation, and high availability for mission-critical applications.
Senior Software Engineer (remote) to develop and operate a full-stack observability platform for a high-growth SaaS company focused on reliability and user-centered solutions.
Sysdig is hiring a Senior Software Engineer for the Data Platform team to architect and implement scalable Go-based data pipelines and drive technical direction for cloud-scale telemetry and analytics.
Lead the design and operation of secure, scalable cloud infrastructure for Anduril's Corporate Technology team as a Senior Site Reliability Engineer focused on reliability, automation, and observability.
Experienced reliability engineer needed to drive automation, observability, incident response, and SLO-driven operations for mission-critical cloud and hybrid systems supporting a U.S. Air Force program.
Lead production reliability efforts for Anduril's Lattice platform by building observability, automation, and scalable infrastructure solutions that keep mission-critical systems operational 24/7.
Work on core software and infrastructure at Dimensional to shape scalable, reliable systems that power general-purpose robotics.
Bluefish seeks a Senior Data Acquisition Engineer to design, operate, and scale production-grade web scraping and ingestion systems that power AI-driven marketing insights.
Lead reliability and security for a distributed GPU marketplace, driving SLOs, incident response, capacity automation, and secure rollouts to ensure 24/7 platform availability.
Cortex, a Series C engineering-operations platform, is hiring a Senior Backend Software Engineer to build scalable, reliable backend systems that power developer productivity for enterprise customers.
Pismo (part of Visa) is hiring a Senior Network Platform SRE to design, automate, and operate secure, resilient hybrid and multi-cloud network topologies with a focus on Azure.
Lead the Validation Engineering organization to design and operate self-service, policy-driven validation and reliability platforms that enable safe, high-velocity production changes at scale for LinkedIn.
Below 50k*
0
|
50k-100k*
2
|
Over 100k*
31
|