Staff Site Reliability Engineer, Production Engineering

3 weeks, 1 day ago
Full-time
Lead
DevOps and Infrastructure
Dropbox

Dropbox

Dropbox is a technology company that builds simple, powerful products for individuals and businesses. With over 700 million registered users worldwide, Dropbox offers file sync, sharing, online backup, cloud storage, collaboration tools, and more to st...

Internet Software & Services
1K-5K
Founded 2007

Description

  • Define and evolve Dropbox’s company-wide technical reliability strategy for AI-assisted and agentic software development.
  • Set multi-year reliability goals, standards, and roadmaps across observability, debugging, incident management, service health, and operational readiness.
  • Lead cross-team initiatives that reduce reliability risk as delivery velocity, pull request volume, service complexity, and incident volume increase.
  • Partner with engineering leaders and platform teams to improve monitoring, alerting, debugging, SLOs, SLAs, and incident response systems.
  • Identify emerging reliability risks in AI-enabled development workflows and design scalable systems, processes, and guardrails to mitigate them.
  • Provide technical leadership and mentorship to engineers across teams to improve engineering quality and operational excellence.
  • Drive communication and alignment with senior stakeholders on reliability priorities, tradeoffs, risks, and execution progress.
  • Participate in on-call rotations as required for teams that operate services on call.

Requirements

  • BS degree in Computer Science or a related technical field involving coding, or equivalent technical experience.
  • 12+ years of experience in software engineering, site reliability engineering, infrastructure engineering, or related technical roles.
  • Proven ability to define and deliver multi-year, multi-team reliability, infrastructure, or platform strategies with measurable business and customer impact.
  • Deep experience with distributed systems, production operations, observability, incident response, SLOs/SLAs, debugging, and reliability risk management.
  • Demonstrated ability to diagnose complex technical problems, debug production systems, automate operational workflows, and design resilient software components.
  • Experience influencing engineering roadmaps across multiple teams and making technical decisions for the broader organization.
  • Strong communication and collaboration skills with the ability to align cross-functional stakeholders through ambiguity and drive execution.
  • Experience adapting reliability strategies, developer tooling, or operational processes for AI-assisted software development workflows (preferred).
  • Experience building or scaling observability, debugging, incident management, or developer productivity platforms for large engineering organizations (preferred).
  • Experience leading reliability improvements in environments with high deployment velocity, complex service dependencies, and large-scale production systems (preferred).
  • Track record of mentoring senior engineers, setting technical standards, and spreading reliability best practices (preferred).
  • Familiarity with AI-enabled tooling, agentic development workflows, or operational risks introduced by rapid automation in the software development lifecycle (preferred).

Benefits

  • Competitive salary range of $223,400–$302,200 USD for US Zone 2.
  • Competitive salary range of $198,600–$268,600 USD for US Zone 3.
  • Opportunity to work on company-wide reliability strategy for a major engineering organization.
  • Role is focused on shaping reliability practices in an AI-enabled development environment.
  • On-call rotations may be part of the role for teams operating services.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

[Job 30278] SRE (DevOps)

CI&T 5K-10K Internet Software & Services

CI&T is hiring a senior SRE/DevOps to evolve the infrastructure behind critical digital products, with a focus on resilient multi-region AWS architecture and mobile delivery pipelines.

Android Ansible API Gateway AWS Bash CI/CD DynamoDB GitHub Actions GitLab CI Grafana iOS Jenkins Kubernetes Prometheus Python Secrets Management Terraform
0 minutes ago

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
1 day ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 23 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 23 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers