Dropbox

Dropbox

Dropbox is a technology company that builds simple, powerful products for individuals and businesses. With over 700 million registered users worldwide, Dropbox offers file sync, sharing, online backup, cloud storage, collaboration tools, and more to st...

Internet Software & Services
1K-5K
Founded 2007

Description

  • Ensure the reliability, scalability, and performance of Dropbox's infrastructure and services.
  • Collaborate with cross-functional teams to define and maintain best practices for monitoring, logging, and incident response.
  • Build, implement, and maintain automation and infrastructure-as-code tooling using Terraform, Ansible, GitHub Actions, and custom code platforms.
  • Use container orchestration platforms such as Kubernetes, Amazon ECS, and Red Hat OpenShift to manage containers at scale.
  • Manage and optimize monitoring and logging pipelines using tools such as Datadog and Cribl LogStream.
  • Lead improvement projects that increase service health and visibility for stakeholders across technical and business teams.
  • Develop and maintain custom tooling and automation scripts in Bash, Python, and other scripting languages.
  • Handle incidents and occasional on-call work related to bugs, outages, or other operational issues.

Requirements

  • 5+ years of experience in site reliability engineering or a similar engineering role with hands-on coding experience.
  • Strong knowledge of AWS services, including EC2, S3, RDS, R53, Lambda, and others.
  • Strong knowledge of Linux administration, internals, filesystems, volume management, Ubuntu, RHEL, DNS, and DHCP.
  • Experience with monitoring and logging tools such as Datadog and pipeline tools such as Vector or Cribl LogStream.
  • Experience driving transformational programs related to metrics and observability.
  • Experience with scripting in a higher-level language, with Python preferred.
  • Experience developing automation for infrastructure tasks using Chef, Ansible, or Terraform.
  • Experience with log analysis and building metrics, alerts, and visuals from log data.
  • Strong proficiency with infrastructure-as-code tools such as Terraform.
  • Strong proficiency with configuration management tools, especially Ansible Automation Platform and Chef.
  • Experience with containerization technologies such as Docker and orchestration platforms like Kubernetes or Amazon ECS.
  • Knowledge of LDAP, REST APIs, and current authentication systems.
  • Familiarity with GitHub and Git-based workflows.
  • Understanding of RDS databases and network security technologies such as WAF.
  • Experience managing large-scale multi-cloud or hybrid infrastructure.
  • Familiarity with Kubernetes, Docker, and serverless platforms.
  • Understanding of compliance and security frameworks such as SOC2, ISO 27001, and FedRAMP.
  • Experience implementing Zero Trust security and access models.
  • Strong problem-solving skills and the ability to work well in a fast-paced, collaborative environment.
  • Excellent written and verbal communication skills.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Database Reliability Engineer

Sporty Group 51-250 Media

Sporty is seeking a Database Reliability Engineer to own and improve its database infrastructure supporting multiple platforms and international expansion.

Ansible Argo CD Elasticsearch GitHub Actions Go Grafana Helm Jenkins Kubernetes MongoDB MySQL PostgreSQL Prometheus Python RabbitMQ Terraform
1 day, 5 hours ago

Senior Site Reliability Engineer

Moniepoint 1K-5K Diversified Financial Services

Moniepoint is hiring an experienced Site Reliability Engineer to improve the reliability, scalability, and observability of its highly distributed financial platform serving emerging markets.

AWS Azure Datadog GCP Go Java Kafka Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python RabbitMQ Rust
1 day, 6 hours ago

Senior Site Reliability Engineer, Identity Platform

Coinbase 1K-5K Capital Markets

Coinbase is hiring an experienced Site Reliability Engineer to build and scale identity and access management tooling for its IT Operations Corporate Engineering team supporting cloud-based, security-first systems.

Ansible AWS Azure C# CI/CD Docker GCP Go Java Kubernetes Python Ruby Secrets Management Terraform
1 day, 7 hours ago

Database Reliability Engineer - Core Team

ClickHouse 51-250 IT Services

ClickHouse is hiring a Site Reliability Engineering team member for ClickHouse Core to improve the reliability, availability, scalability, and performance of ClickHouse Cloud for customers worldwide.

AWS Azure C++ ClickHouse GCP Python SQL
1 day, 7 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers