Job details

Senior Platform & Reliability Engineer

🧑🏼 💻 Senior Platform & Reliability Engineer

🎨 About OpenArt

OpenArt is an AI Storytelling and Visual Creation Platform used by millions worldwide. We’re building the next generation of creative tools powered by cutting-edge AI, enabling anyone to create videos, visuals, characters, and stories with unprecedented speed and imagination.

We believe the future of creativity is AI-native, and we're shaping that future.

🚀 Why Join OpenArt

Small team, massive surface area, senior engineers own real systems, notslices.
Ship at real scale, your work goes to millions of users, fast.
Founder-led engineering culture, both founders are technical and deeplyinvolved in product and architecture.
AI-native product, you’ll design how cutting-edge AI models are exposed asreal user experiences.
High ownership, low process, we value judgment, clarity, and speed overbureaucracy.
Senior Platform & Reliability Engineer 1
7-10X growth in revenue for the past 2 years. Now you’ll play a critical role inhelping the company scale to the next stage.

🎯 About the Role

We’re looking for a Senior Platform & Reliability Engineer to help design, scale, and improve the reliability of our infrastructure, from architectural decisions to hands-on implementation, observability, and cost optimization.

This is not a traditional ops or DevOps role. You’ll work across cloud infrastructure, distributed systems, backend services, and developer tooling, making pragmatic decisions that balance product velocity, system reliability, and cost efficiency—in a fast-moving, AI-native environment.

You’ll partner closely with product engineers to evolve the platform that powers OpenArt, contributing to key decisions around infrastructure architecture, improving multi-provider AI reliability, and helping us scale systems to millions of users—while raising the overall engineering bar.

🛠 What You’ll Do

Define and operationalize SLOs/SLIs across critical user journeys (generation, editing, payments/credits, uploads), and use them to guide prioritization and tradeoffs.
Participate in an on-call rotation and improve incident response (alert quality, run books, escalation paths), including leading blameless postmortems and driving follow-through on action items.
Improve system resilience at external boundaries (AI providers, storage, etc.),including timeouts, retries, circuit breakers, and fallback strategies. Build and maintain end-to-end observability (logs, metrics, traces, dashboards) so engineers can quickly understand “what broke” and “why.”
Strengthen deploy safety through CI/CD improvements, automated rollbacks, canary releases, and feature flag patterns.
Contribute to the evolution of our infrastructure architecture, helping evaluate when to extend serverless patterns vs. adopt containerized or more managed approaches as we scale.
Improve cost visibility and efficiency, including per-request cost attribution, caching strategies, and capacity planning.
Act as a strong technical contributor, helping improve engineering practices, tooling, and system design decisions across the team.

🧑 💻 What We’re Looking For

Core Requirements

5+ years building and operating production systems where reliability and scaling are important.
Strong software engineering skills — you can build and ship production code, not just configure infrastructure.
Experience with cloud-native systems (AWS or GCP), including serverless/event-driven architectures and at least one container-based approach (e.g., ECS/Fargate, Cloud Run, Kubernetes).
Solid understanding of observability and reliability practices: metrics, alerting, tracing, and incident response.
Experience designing resilient systems with external dependencies (timeouts, retries/backoff, idempotency, circuit breakers).
Ability to communicate technical tradeoffs clearly to engineers across different domains.
Comfortable operating in ambiguous, fast-moving environments and taking ownership of problems.
Nice to Have
Experience building internal platform abstractions (e.g., job orchestration, APIlayers, workflow systems) that improve team velocity.
Track record of improving reliability metrics (e.g., MTTR, SLO attainment, latency) or reducing infrastructure cost.
Experience working in a startup or high-growth environment, with broad ownership across systems.

⚙ Tech Stack You’ll Work With

GCP, Cloud Run, Modal, Upstash, Sentry, Amplitude, Firebase, Redis, React /Next.js, Node.js, TypeScript, Python, etc.

💰 Compensation

Competitive base salary and bonus program
Equity - meaningful ownership in what you build
High autonomy, high growth environment

🌍 Work Setup

Bay Area preferred (hybrid allowed)
Visa sponsorship available
We’ll consider remote

Senior Platform Engineer Reliability Engineer SRE GCP Cloud Run Kubernetes Serverless CI/CD Observability SLO Incident Response Python TypeScript Node.js Redis Firebase Sentry Amplitude Modal Upstash

Average salary estimate

$195000 / YEARLY (est.)

min

max

$160000K

$230000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Lead Operational Software Deployment and Integration Engineer

Boeing Hybrid USA - Beale AFB, CA

VIEW

Posted 16 hours ago

Lead Operational Software Deployment and Integration Engineer responsible for on-site mission software deployment, integration, configuration control, and field readiness for Boeing Phantom Works at Beale AFB.

Senior Security Engineer (Application & Cloud Security)

MagicSchool AI Hybrid No location specified

VIEW

Posted 8 hours ago

Lead application and cloud security for a fast-growing AI EdTech platform, embedding with engineering teams to build secure-by-default systems and developer-friendly security workflows.

Software Engineer

Wellmark, Inc. Hybrid Des Moines, IA, USA

VIEW

Posted 14 hours ago

Wellmark is hiring a Software Engineer to design and build data-focused integrations and pipelines that support HEDIS and quality measurement in a regulated healthcare environment.

Founding Member of Technical Staff

TierZero Hybrid San Francisco

VIEW

Posted 14 hours ago

Help build TierZero's core product as a founding engineer, designing agentic LLM systems, ML pipelines, and scalable infrastructure to accelerate how teams run code in production.

Software Engineer Lead & Architect

Accenture Federal Services Hybrid Arlington, VA

VIEW

Posted 13 hours ago

Lead architecture and engineering efforts to design, build, and deliver scalable, containerized applications using Golang, JavaScript, and Python for mission-driven federal clients.

Mid-level Frontend Super SWE

P-1 AI Hybrid United States

VIEW

Posted 20 hours ago

Help design and implement the UI and interaction layer between engineers and Archie, shaping workflows and real-time systems that make AI a practical engineering teammate.

Mgr, Software Engineering

Renesas Electronics Hybrid Austin, TEXAS

VIEW

Posted 13 hours ago

Lead and mentor a hybrid software engineering team at Renesas to deliver embedded software solutions using Java/Kotlin and Python while driving execution, collaboration, and process improvements.

Senior Director of Engineering – Web Platform

A Place for Mom Hybrid No location specified

VIEW

Posted 3 hours ago

Lead and scale the Web Platform engineering organization to deliver high-performance, SEO-driven web experiences using modern web technologies and strong cross-functional collaboration.

Principal Software Engineer

CSC Generation Hybrid No location specified

VIEW

Posted 7 hours ago

Experienced Principal Software Engineer sought to lead architecture, mentor teams, and deliver scalable, high-performance ecommerce solutions across Backcountry’s portfolio.

Java Technical Lead/Architect(TMA2001)

IDEALFORCE LLC Hybrid Santa Clara, CA

VIEW

Posted 23 hours ago

Experienced Java Technical Lead/Architect needed to provide hands-on architecture, design reviews, and leadership for large-scale enterprise systems in Santa Clara.

Full Stack Developer (Pipeline)

InterImage Hybrid No location specified

VIEW

Posted 10 hours ago

Experienced Full Stack Developer needed to maintain and enhance WEBCANDID and TESTFLIGHT reporting tools, including on-call support for mission-critical operations.

Network Automation Engineer II

BlueAlly Hybrid Remote, OR, USA

VIEW

Posted 15 hours ago

Experienced network automation engineer needed to build and maintain Python automation, NetBox integrations, and multi-vendor networking workflows for a client-facing engineering team.

Staff Software Engineer - PAM Core

Okta Hybrid San Francisco, California

VIEW

Posted 2 hours ago

Rise from Within

Mission Driven

Diversity of Opinions

Work/Life Harmony

Maternity Leave

Paternity Leave

401K Matching

Paid Holidays

Paid Sick Days

Paid Time-Off

Paid Volunteer Time

Health Savings Account (HSA)

Flexible Spending Account (FSA)

Family Coverage (Insurance)

Medical Insurance

Mental Health Resources

Lead the design and delivery of cloud-native privileged access infrastructure at Okta, contributing to FedRAMP efforts and operating high-scale services built on Kubernetes, AWS, and modern observability tooling.

O OpenArt AI

2 jobs

MATCH

Calculating your matching score...

FUNDING

Growth

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Senior Level

TEAM SIZE

No info