Dropbox

Dropbox is a technology company that builds simple, powerful products for individuals and businesses. With over 700 million registered users worldwide, Dropbox offers file sync, sharing, online backup, cloud storage, collaboration tools, and more to st...

Internet Software & Services

Information Technology

1K-5K (3118)

Founded 2007

61 open positions

Links

View All Jobs

Site Reliability Engineer

1 month, 4 weeks ago

Mexico

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Ansible AWS Bash Chef Datadog DHCP DNS Docker EC2 GitHub GitHub Actions GitOps Kubernetes Linux Python REST API RHEL Serverless Terraform Ubuntu WAF

Apply Now

Dropbox

Internet Software & Services

1K-5K

Founded 2007

View All Jobs 61

Description

Ensure the reliability, scalability, and performance of Dropbox's infrastructure and services.
Collaborate with cross-functional teams to define and maintain best practices for monitoring, logging, and incident response.
Build, implement, and maintain automation and infrastructure-as-code tooling, including Terraform, Ansible, GitHub Actions, and custom code platforms.
Use container orchestration platforms such as Kubernetes, Amazon ECS, and Red Hat OpenShift to manage containers at scale.
Manage and optimize monitoring and logging pipelines using tools such as Datadog and Cribl LogStream.
Drive service health and visibility improvement projects for stakeholders across technical and business groups.
Develop and maintain custom tooling and automation scripts in Bash, Python, and other scripting languages.
Handle incidents and occasional on-call duties to address bugs, outages, and other operational issues.
Contribute to the evolution of infrastructure while ensuring security and scalability.

Requirements

5+ years of experience in site reliability engineering or a similar engineering role with hands-on coding experience.
Strong knowledge of AWS services, including EC2, S3, RDS, R53, Lambda, and others.
Strong knowledge of Linux administration, internals, filesystems, volume management, Ubuntu, RHEL, DNS, and DHCP.
Experience with monitoring and logging tools such as Datadog and pipeline tools such as Vector or Cribl LogStream.
Experience driving transformational programs related to metrics and observability.
Experience with scripting in a higher-level language, with Python preferred.
Experience developing automation for infrastructure-related tasks using Chef, Ansible, or Terraform.
Experience with log analysis and building metrics, alerts, and visuals from log data.
Strong proficiency in infrastructure-as-code tools, such as Terraform.
Strong proficiency in configuration management tools, specifically Ansible Automation Platform and Chef.
Experience with containerization technologies such as Docker and orchestration platforms like Kubernetes or Amazon ECS.
Knowledge of LDAP, REST APIs, and current authentication systems.
Familiarity with GitHub and Git-based workflows.
Understanding of RDS databases and network security technologies such as WAF.
Strong problem-solving skills and the ability to work well in a fast-paced, collaborative environment.
Excellent written and verbal communication skills.
Experience managing large-scale multi-cloud or hybrid infrastructure.
Strong background in Infrastructure as Code and GitOps workflows.
Familiarity with Kubernetes, Docker, and serverless platforms.
Proven track record improving observability, reliability, and incident response.
Understanding of compliance and security frameworks such as SOC2, ISO 27001, and FedRAMP.
Experience implementing Zero Trust security and access models.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Canada Full-time Lead Infrastructure Engineer Site Reliability Engineer (SRE)

$86k-$127k

Ansible DNS Linux Puppet Python TCP/IP Unix

18 hours, 45 minutes ago

Apply

18 hours, 45 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

United States Full-time Lead Site Reliability Engineer (SRE)

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server

19 hours ago

Apply

19 hours ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Egypt Full-time Lead QA Engineer Site Reliability Engineer (SRE)

Azure CI/CD Kubernetes PowerShell

19 hours, 15 minutes ago

Apply

19 hours, 15 minutes ago