Dropbox

Dropbox

Dropbox is a technology company that builds simple, powerful products for individuals and businesses. With over 700 million registered users worldwide, Dropbox offers file sync, sharing, online backup, cloud storage, collaboration tools, and more to st...

Internet Software & Services
1K-5K
Founded 2007

Description

  • Ensure the reliability, scalability, and performance of Dropbox's infrastructure and services.
  • Collaborate with cross-functional teams to define and maintain best practices for monitoring, logging, and incident response.
  • Build, implement, and maintain automation and infrastructure-as-code tooling, including Terraform, Ansible, GitHub Actions, and custom code platforms.
  • Use container orchestration platforms such as Kubernetes, Amazon ECS, and Red Hat OpenShift to manage containers at scale.
  • Manage and optimize monitoring and logging pipelines using tools such as Datadog and Cribl LogStream.
  • Drive service health and visibility improvement projects for stakeholders across technical and business groups.
  • Develop and maintain custom tooling and automation scripts in Bash, Python, and other scripting languages.
  • Handle incidents and occasional on-call duties to address bugs, outages, and other operational issues.
  • Contribute to the evolution of infrastructure while ensuring security and scalability.

Requirements

  • 5+ years of experience in site reliability engineering or a similar engineering role with hands-on coding experience.
  • Strong knowledge of AWS services, including EC2, S3, RDS, R53, Lambda, and others.
  • Strong knowledge of Linux administration, internals, filesystems, volume management, Ubuntu, RHEL, DNS, and DHCP.
  • Experience with monitoring and logging tools such as Datadog and pipeline tools such as Vector or Cribl LogStream.
  • Experience driving transformational programs related to metrics and observability.
  • Experience with scripting in a higher-level language, with Python preferred.
  • Experience developing automation for infrastructure-related tasks using Chef, Ansible, or Terraform.
  • Experience with log analysis and building metrics, alerts, and visuals from log data.
  • Strong proficiency in infrastructure-as-code tools, such as Terraform.
  • Strong proficiency in configuration management tools, specifically Ansible Automation Platform and Chef.
  • Experience with containerization technologies such as Docker and orchestration platforms like Kubernetes or Amazon ECS.
  • Knowledge of LDAP, REST APIs, and current authentication systems.
  • Familiarity with GitHub and Git-based workflows.
  • Understanding of RDS databases and network security technologies such as WAF.
  • Strong problem-solving skills and the ability to work well in a fast-paced, collaborative environment.
  • Excellent written and verbal communication skills.
  • Experience managing large-scale multi-cloud or hybrid infrastructure.
  • Strong background in Infrastructure as Code and GitOps workflows.
  • Familiarity with Kubernetes, Docker, and serverless platforms.
  • Proven track record improving observability, reliability, and incident response.
  • Understanding of compliance and security frameworks such as SOC2, ISO 27001, and FedRAMP.
  • Experience implementing Zero Trust security and access models.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Database Reliability Engineer (DBRE) (worldwide remote)

CloudLinux 51-250 IT Services

CloudLinux / TuxCare is hiring a Senior Database Reliability Engineer to own and improve the reliability, automation, and incident response of its production PostgreSQL and broader database infrastructure.

Ansible ClickHouse DNS GitLab Grafana JIRA Linux MongoDB OpsGenie PostgreSQL Redis Terraform TLS
46 minutes ago

Associate SRE

66degrees 251-1K IT Services

66degrees is hiring a Site Reliability Engineer to support enterprise Google Cloud environments through reliability engineering, automation, and incident response for client workloads.

Agile Datadog GCP Kanban Kubernetes Linux Prometheus Python Scrum Terraform
6 hours, 18 minutes ago

Operations Reliability Engineer - Automations

Alpaca 51-250 Capital Markets

Alpaca is hiring an Operations Reliability Engineer to embed within brokerage operations and build software that replaces manual work with durable, auditable systems at global scale.

Agile Argo CD CI/CD Docker GCP Go gRPC Kubernetes Microservices PostgreSQL React REST API Scrum SQL Terraform TypeScript
13 hours, 35 minutes ago

Site Reliability Engineer

SupplyHouse.com 251-1K Building Materials

SupplyHouse.com is hiring a full-time Site Reliability Engineer in India to support the scalability, reliability, and performance of its cloud infrastructure and applications.

Ansible Bash CI/CD Datadog Docker GCP GitLab CI Go Grafana Jenkins Kubernetes Linux Network Security Prometheus Python Terraform Unix
14 hours, 7 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers