Staff Site Reliability Engineer

1 month ago
Lead
DevOps and Infrastructure
Puck

Puck

Puck helps great teams find great teammates through employer branding, conversations, and authentic candidate engagement, using personalized automation to enhance the candidate experience and improve hiring metrics.

Internet Software & Services
1-10
Founded 2020

Description

  • Lead the development of internal AI-assisted reliability tooling that analyzes tickets, logs, traces, and documentation to speed up outage resolution.
  • Improve observability coverage and signal quality for critical customer-facing systems across the development and support lifecycle.
  • Own incident response end-to-end, from detection through remediation, and improve documentation and learning after incidents.
  • Guide the development of customer- and user-facing observability tools within Domino’s products.
  • Define and mature SLO and SLI frameworks for priority services.
  • Scale cloud operations practices for Domino’s single-tenant SaaS offering.
  • Work with engineering teams to improve the reliability and repeatability of customer deployments and upgrades.
  • Mentor other engineers and help shape SRE workflows, operational readiness, and post-incident learning culture.

Requirements

  • Deep experience in Site Reliability Engineering, platform engineering, or a software engineering role with hands-on operational ownership.
  • Fluency with Kubernetes, Linux, cloud platforms, and observability tooling.
  • Ability to investigate complex real-world production problems using operational tooling and signals.
  • Strong software engineering skills in Python or Go.
  • Track record of building internal tools or services that people rely on.
  • Comfort leading technically ambiguous work and influencing across teams without direct authority.
  • History of improving reliability through engineering and automation.
  • Strong communication skills and experience mentoring engineers or shaping technical decisions.
  • Sound judgment about AI/LLM tooling, including when it is useful and when it adds noise.
  • Bonus: Experience with LLM-based systems, retrieval workflows, SaaS platform operations, or tooling for support or developer teams.

Benefits

  • Remote-first role indicated by #LI-Remote.
  • Opportunity to work on high-impact reliability tooling for AI-driven customer solutions.
  • Chance to help define and shape the SRE practice at Domino.
  • Work at a startup-style team backed by leading investors.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Sr. Site Reliability Engineer (Starshield)

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Senior Site Reliability Engineer for Starshield to build and operate reliable infrastructure and automation supporting secure government satellite systems.

Ansible Bash CI/CD Kubernetes Linux Python TCP/IP Terraform
10 hours, 54 minutes ago

Sr. Site Reliability Engineer (Starshield)

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Senior Site Reliability Engineer for Starshield to build and operate reliable infrastructure supporting government-focused satellite systems and national security missions.

Ansible Bash CI/CD Kubernetes Linux Python TCP/IP Terraform
11 hours, 23 minutes ago

Senior Site Reliability Engineer

DexCare 51-250 Health Care Providers & Services

DexCare is hiring a Senior Site Reliability Engineer to help operate and improve its AWS-based healthcare infrastructure that supports digital care access and reliable patient service delivery.

Agile AWS Azure CI/CD Datadog EC2 GitHub Actions Helm HIPAA JIRA Kubernetes Python Scrum Serverless Terraform
11 hours, 39 minutes ago

Data Center Reliability Engineer

Phaidra 51-250 Internet Software & Services

Phaidra is hiring a Data Center Reliability Engineer to translate data center telemetry into operational intelligence for its AI-powered monitoring and control systems.

GitLab LLM Machine Learning NumPy Pandas Python Reinforcement Learning
1 day, 10 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers