Dropbox

Dropbox is a technology company that builds simple, powerful products for individuals and businesses. With over 700 million registered users worldwide, Dropbox offers file sync, sharing, online backup, cloud storage, collaboration tools, and more to st...

Internet Software & Services

Information Technology

1K-5K (3118)

Founded 2007

27 open positions

Links

View All Jobs

Staff Site Reliability Engineer, Production Engineering

1 month, 2 weeks ago

Canada

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Apply Now

Dropbox

Internet Software & Services

1K-5K

Founded 2007

View All Jobs 27

Description

Define and evolve Dropbox’s company-wide technical reliability strategy for an AI-assisted engineering environment.
Set multi-year goals, standards, and roadmaps for observability, debugging, incident management, service health, and operational readiness.
Lead cross-team efforts to reduce reliability risk as deployment velocity, pull request volume, service complexity, and incident volume increase.
Partner with engineering leaders and platform teams to improve monitoring, alerting, debugging, SLOs, SLAs, and incident response systems.
Identify emerging reliability risks from AI-enabled workflows and design scalable systems, processes, and guardrails to mitigate them.
Provide technical leadership and mentorship to engineers across teams to raise engineering quality and operational excellence.
Drive communication and alignment with senior stakeholders on reliability priorities, tradeoffs, risks, and execution progress.

Requirements

BS degree in Computer Science or a related technical field involving coding, or equivalent technical experience.
12+ years of experience in software engineering, site reliability engineering, infrastructure engineering, or related technical roles.
Proven ability to define and deliver multi-year, multi-team reliability, infrastructure, or platform strategies with measurable business and customer impact.
Deep experience with distributed systems, production operations, observability, incident response, SLOs/SLAs, debugging, and reliability risk management.
Demonstrated ability to diagnose complex technical problems, debug production systems, automate operational workflows, and design resilient software components.
Experience influencing engineering roadmaps across multiple teams and making technical decisions for the broader engineering organization.
Strong communication and collaboration skills, with the ability to align cross-functional stakeholders through ambiguity and drive execution across teams.
Experience adapting reliability strategies, developer tooling, or operational processes for AI-assisted software development workflows (preferred).
Experience building or scaling observability, debugging, incident management, or developer productivity platforms for large engineering organizations (preferred).
Experience leading reliability improvements in environments with high deployment velocity, complex service dependencies, and large-scale production systems (preferred).
Track record of mentoring senior engineers, setting technical standards, and spreading reliability best practices through documentation, reviews, talks, or architecture guidance (preferred).
Familiarity with AI-enabled tooling, agentic development workflows, or operational risks introduced by rapid automation in the software development lifecycle (preferred).

Benefits

Canada pay range: $204,900 to $277,200 CAD.
On-call rotations may be part of the role, with availability during both core and non-core business hours.
Opportunity to work on company-wide reliability strategy for a major engineering organization.
Exposure to shaping long-term platform investments and operational practices at scale.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Ssr Monitoring and Observability Analyst

Coderio 51-250 Internet Software & Services

Coderio is hiring an Observability & Monitoring Analyst to design and operate monitoring systems that improve availability, performance, and incident response across global clients’ IT environments.

Mexico Colombia Argentina Uruguay Contract Mid Level Site Reliability Engineer (SRE)

AWS Azure Bash Datadog DNS Docker ELK Stack Fluentd GCP Grafana Jaeger Kibana Kubernetes Linux Load Balancing Logstash New Relic OpenTelemetry Prometheus Python TCP/IP Zipkin

2 minutes ago

Apply

2 minutes ago

Senior Site Reliability Engineer

Counterpart Health 51-200 hospital & health care

Counterpart Health is hiring a Senior Site Reliability and Infrastructure Engineer to support and evolve the technology platform behind its primary care tool and maintain reliable infrastructure for domestic and international workloads.

United States Full-time Senior Site Reliability Engineer (SRE)

$160k-$208k

AWS Azure CI/CD Containerd DNS Docker GCP Go gRPC Helm Kubernetes Linux Load Balancing Prometheus Python Shell Scripting TCP/IP

1 day, 23 hours ago

Apply

1 day, 23 hours ago

Senior Test Platform & Reliability Engineer - Star Trek Fleet Command

Scopely 1K-5K Internet Software & Services

Scopely is hiring a Senior Test Platform & Reliability Engineer in Ireland to build validation, reliability, and developer enablement platforms for Star Trek Fleet Command’s large-scale live-service backend systems.

Ireland Full-time Senior SDET (Software Development Engineer in Test) Site Reliability Engineer (SRE)

AWS Bash CI/CD Docker GitLab Go Python Terraform

2 days ago

Apply

2 days ago

Senior Software Engineer - Databases, SRE | Canada | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Senior Software Engineer for its remote SRE team to improve reliability and operability of Grafana Cloud database services for high-SLA customers across AWS, GCP, and Azure.

Canada Full-time Senior Site Reliability Engineer (SRE) Software Engineer

$108k-$130k

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform

2 days, 23 hours ago

Apply

2 days, 23 hours ago

Dropbox

Tags

Links

Staff Site Reliability Engineer, Production Engineering

Dropbox

Description

Requirements

Benefits

Similar Roles

Ssr Monitoring and Observability Analyst

Senior Site Reliability Engineer

Senior Test Platform & Reliability Engineer - Star Trek Fleet Command

Senior Software Engineer - Databases, SRE | Canada | Remote

You're on a roll! Sign up now to keep applying.