Job details

Founding Platform & Reliability Engineer

🎨 About OpenArt

OpenArt is an AI Storytelling and Visual Creation Platform used by millions worldwide. We’re building the next generation of creative tools powered by cutting-edge AI, enabling anyone to create videos, visuals, characters, and stories with unprecedented speed and imagination. We believe the future of creativity is AI-native, and we're shaping that future.

🚀 Why Join OpenArt

Small team, massive surface area, senior engineers own real systems, not slices.
Ship at real scale, your work goes to millions of users, fast.
Founder-led engineering culture, both founders are technical and deeply involved in product and architecture.
AI-native product, you’ll design how cutting-edge AI models are exposed as real user experiences.
High ownership, low process, we value judgment, clarity, and speed over bureaucracy.
7-10X growth in revenue for the past 2 years. Now you’ll play a critical role in helping the company scale to the next stage.

🎯 About the Role

We’re looking for a Founding Platform & Reliability Engineer who can own the design, scalability, and reliability of our entire infrastructure stack end-to-end, from high-level architecture decisions to hands-on implementation, observability, and cost optimization.

This is NOT a role for traditional operators or narrow DevOps specialists. You should be comfortable working across cloud infrastructure, distributed systems, backend services, and developer tooling, making pragmatic decisions that balance product velocity, system reliability, and cost efficiency—especially in a fast-evolving, AI-native environment.

You will work closely with the founders and product engineers to design and evolve the platform that powers OpenArt, shaping key decisions such as serverless vs. containerized architecture, multi-provider AI reliability, and scaling systems to millions of users—while acting as a force multiplier for the entire engineering team.

🛠 What You’ll Do

Define and operationalize SLOs/SLIs across critical user journeys (generation, editing, payments/credits, uploads, etc.), and use them to drive prioritization (including error budgets)
Participate in an on-call rotation and lead incident response improvements (alert quality, runbooks, escalation paths). Establish blameless postmortems and ensure action items are implemented.
Implement reliability patterns at external boundaries, and build mechanisms for per-vendor “health” measurement and routing/fallback policies
Stand up end-to-end observability: structured logs, metrics, traces, and dashboards that let engineers answer “what broke” and “why now” quickly.
Build deploy safety practices: automated rollbacks, canarying, feature-flag patterns, and reliable CI/CD gates.
Own the direction of our infrastructure architecture, including defining when serverless is the right approach versus when we should evolve toward containerized or more managed systems, and guiding the team through those transitions as we scale.
Build cost observability and cost-control primitives: per-request cost attribution, caching strategies, capacity planning, and budget alerts.
Act as a senior technical voice, influencing architecture, tooling, engineering best practices, and raising the overall engineering bar.

🧑‍💻 What We’re Looking For

Core Requirements

5+ years building and operating production systems where reliability and scaling are core.
Strong software engineering skills (you can ship production code, not just configure tools).
Cloud-native experience (AWS or GCP), ideally with serverless/event-driven systems and at least one container path (Fargate/ECS/Cloud Run/Kubernetes).
Deep knowledge of observability practices: dashboards, alerting, distributed tracing, and incident response maturity.
Ability to design resilient interactions with external dependencies (timeouts, retries/backoff/jitter, circuit breakers, idempotency).
Can communicate tradeoffs to non-infra peers clearly
Ability to operate with ambiguity and define problems before solving them.

Nice to Have

Have designed an internal platform abstraction (e.g., API gateway / workflow engine / job orchestration) that enabled multiple product teams to ship faster with fewer incidents.
Have shipped concrete reliability outcomes: e.g., reduced MTTR, improved SLO attainment, lowered p95 latency, or reduced infra/unit costs
Prior startup experience or experience owning large surface-area features.

⚙ Tech Stack You’ll Work With

GCP, Cloud Run, Modal, Upstash, Sentry, Amplitude, Firebase, Redis, React / Next.js, Node.js, TypeScript, Python, etc.

💰 Compensation

Competitive base salary and bonus program
Equity - meaningful ownership in what you build
High autonomy, high growth environment

🌍 Work Setup

Bay Area preferred (hybrid allowed)
Visa sponsorship available
We’ll consider remote

Platform Engineer Reliability SRE GCP Cloud Run Serverless Containers Kubernetes Observability SLO SLI CI/CD TypeScript Node.js Python Redis Modal Sentry Amplitude Firebase Cost optimization

Average salary estimate

$205000 / YEARLY (est.)

min

max

$170000K

$240000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs

Angular Frontend Web Developer - Contingent

Aretum Hybrid No location specified

VIEW

Posted 16 hours ago

Experienced Angular frontend developer needed to implement accessible, component-driven web interfaces for a federal modernization program and collaborate with UX, backend, and product teams.

Senior Software Engineer

Fundrise Hybrid No location specified

VIEW

Posted 12 hours ago

Work on high-impact screening and fraud-prevention systems at Fundrise, building reliable, scalable software that protects millions of users while partnering closely with Legal, Finance, and Operations.

Deployed Engineer (Las Vegas)

LangChain Hybrid No location specified

VIEW

Posted 1 hour ago

LangChain is hiring a Deployed Engineer to partner with customers on designing, deploying, and operating production AI agents and multi-step LLM workflows.

Software Engineer, Entry Level

FIS Hybrid US FL JAX 347

VIEW

Posted 15 hours ago

Contribute as a full-stack engineer on FIS's Money Movement Hub, building scalable AWS-based microservices and improving CI/CD and operational reliability for payment platforms.

Fullstack Engineer (Founder's Office)

Bioscope AI Hybrid Salt Lake City

VIEW

Posted 20 hours ago

Work directly with the founder to harden rapid AI-driven prototypes into battle-tested, frontend-forward foundations for an early-stage precision medicine platform.

Principal Software Engineer

CSC Generation Hybrid No location specified

VIEW

Posted 7 hours ago

Experienced Principal Software Engineer sought to lead architecture, mentor teams, and deliver scalable, high-performance ecommerce solutions across Backcountry’s portfolio.

Software Engineer - Core Sensors

Zoox Hybrid Foster City, CA

VIEW

Posted 21 hours ago

Zoox is hiring a skilled C++ software engineer to design and maintain high-performance, safety-critical drivers for lidar, radar, and camera sensors that feed the autonomous driving stack.

Machine Learning Infrastructure Engineer, GenAI Technology

Point72 Hybrid United States

VIEW

Posted 11 hours ago

Point72 is hiring a Machine Learning Infrastructure Engineer to build and operate scalable GenAI infrastructure that accelerates model development and production across cloud and on-prem environments.

Senior Software Engineer - Mobile

Rev Hybrid Austin

VIEW

Posted 24 hours ago

Senior Software Engineer (Mobile) to lead and deliver high-quality React Native mobile experiences while contributing across Rev’s full-stack platform to accelerate growth and engagement.

Deployed Engineer (Salt Lake City)

LangChain Hybrid No location specified

VIEW

Posted 4 hours ago

Work with customers to co-architect, build, and operate production AI agents using LangChain’s platform and tools.

Software Engineer

Autodesk Hybrid California, USA - Remote

VIEW

Posted 12 hours ago

Autodesk's Enterprise Data Management team is hiring an early-career Software Engineer to build backend systems and data features that ensure reliable customer data and insights.

Senior Software Engineer, Identity & Access

Patreon Hybrid No location specified

VIEW

Posted 12 hours ago

Inclusive & Diverse

Transparent & Candid

Growth & Learning

Diversity of Opinions

Mission Driven

Customer-Centric

Rapid Growth

Dare to be Different

Collaboration over Competition

Work on Patreon's Identity & Access team to design and implement authentication, verification, and anti-account-takeover systems that protect creators and fans while delivering a great user experience.

Engineering Manager, Processing

Lithic Hybrid Remote

VIEW

Posted 13 hours ago

Customer-Centric

Collaboration over Competition

Fast-Paced

Growth & Learning

Lithic is looking for an Engineering Manager to lead the Processing team responsible for low-latency, highly available transaction processing and network peering across card networks.

E Embedding VC

11 jobs

MATCH

Calculating your matching score...

FUNDING

Seed

DEPARTMENTS

Software Engineering

SENIORITY LEVEL REQUIREMENT

Senior Level

TEAM SIZE

No info