Puck

Puck helps great teams find great teammates through employer branding, conversations, and authentic candidate engagement, using personalized automation to enhance the candidate experience and improve hiring metrics.

Internet Software & Services

Information Technology

1-10 (10)

Founded 2020

13 open positions

Links

View All Jobs

Staff Platform Reliability Engineer

1 month, 1 week ago

United States

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Grafana K6 Kubernetes Locust New Relic Prometheus Python

Apply Now

Puck

Internet Software & Services

1-10

Founded 2020

View All Jobs 13

Description

Serve as the technical owner of Tempest, Domino's scale and reliability platform.
Diagnose and resolve performance bottlenecks and resource misconfigurations surfaced by scale testing.
Profile services and trace root causes using observability data from Prometheus and New Relic.
Partner with platform and infrastructure teams to ship durable fixes rather than only filing tickets.
Deliver accurate, data-driven sizing recommendations for customer-facing documentation.
Strengthen observability by improving instrumentation, dashboards, and queries for scale testing.
Establish and operationalize scale testing on cloud platforms with appropriate sizing and configuration guidance.
Enable scale and reliability testing across additional cloud providers in partnership with platform teams.
Build infrastructure automation that improves operational efficiency as the product and customer base grow.

Requirements

Background in SRE, platform engineering, or infrastructure.
Hands-on experience operating and troubleshooting distributed systems in production Kubernetes environments.
Strong proficiency in Python.
Comfort working in a large, modular codebase spanning orchestration, infrastructure automation, and systems integration.
Experience with observability stacks such as Prometheus, Grafana, New Relic, or similar.
Ability to write queries, build dashboards, and use metrics to diagnose performance and reliability issues.
Demonstrated ability to profile services, identify resource bottlenecks, and drive durable fixes with engineering teams.
Familiarity with performance and load testing tools or methodologies such as Locust, k6, or similar.
Self-directed, accountable ownership mindset.
Ability to communicate priorities and status effectively in a remote, async environment.

Benefits

Annual US base salary range of $185,000 to $230,000.
Additional equity may be included.
Company bonus or sales commissions/bonuses may be included.
401(k) plan.
Medical, dental, and vision benefits.
Wellness stipends.
Remote role (#LI-Remote).

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its brokerage platform reliable and operable across cloud, Kubernetes, observability, messaging, and database systems, with a strong focus on PostgreSQL reliability on the trading-critical path.

Europe Full-time Mid Level Site Reliability Engineer (SRE)

DNS GitOps Go Kafka Kubernetes Linux Load Balancing PostgreSQL Python RabbitMQ Secrets Management TLS

2 hours, 4 minutes ago

Apply

2 hours, 4 minutes ago

Site Reliability Engineer

Kaseya 1K-5K IT Services

Kaseya is hiring a Site Reliability Engineer to own the reliability, automation, and production stability of its AWS-based services used by thousands of MSPs worldwide.

Canada Full-time Mid Level Site Reliability Engineer (SRE)

$85k-$96k

Ansible AWS Chef CloudFormation Datadog DevSecOps Elasticsearch Kibana Kubernetes MySQL PostgreSQL Puppet Secrets Management Serverless Terraform

6 hours, 4 minutes ago

Apply

6 hours, 4 minutes ago

SRE - DevOps Engineer - Argentina

Coderio 51-250 Internet Software & Services

Coderio is hiring a remote DevOps/SRE Engineer in Argentina to ensure the stability, scalability, and efficient operation of the infrastructure that supports its global digital solutions.

Argentina Full-time Mid Level Site Reliability Engineer (SRE)

Argo CD CI/CD Flux GitHub Actions GitOps Helm Jenkins Kubernetes OpenShift Terraform

9 hours, 44 minutes ago

Apply

9 hours, 44 minutes ago

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Poland Full-time Senior Site Reliability Engineer (SRE)

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript

17 hours, 17 minutes ago

Apply

17 hours, 17 minutes ago

Puck

Tags

Links

Staff Platform Reliability Engineer

Puck

Description

Requirements

Benefits

Similar Roles

Site Reliability Engineer

Site Reliability Engineer

SRE - DevOps Engineer - Argentina

Senior Site Reliability Engineer

You're on a roll! Sign up now to keep applying.