Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

5 days ago
Full-time
Senior
Software Development
MongoDB

MongoDB

MongoDB provides a developer data platform that simplifies data management and accelerates application development, enabling businesses to leverage modern database technology for innovative solutions across various industries.

Internet Software & Services
1K-5K
Founded 2007

Description

  • Work on multi-tenant distributed storage systems while balancing long-term infrastructure goals with immediate engineering needs.
  • Build reliable, resilient, fault-tolerant, and self-healing services and infrastructure.
  • Define and configure metrics to detect incidents and measure service health, availability, and performance.
  • Participate in a 24/7 on-call rotation to resolve storage infrastructure issues.
  • Optimize infrastructure performance across the stack, from the application layer down to the kernel.
  • Partner with engineering teams to define SLOs and capacity plans for storage services.
  • Support the operational safety, durability, and consistency of the Atlas storage layer.

Requirements

  • 6+ years of experience in software development and operating distributed systems.
  • Proficiency in Python, Go, or a similar programming language.
  • Experience operating or supporting stateful storage or database systems at scale.
  • Comfort with durability, consistency, and recovery trade-offs in storage systems.
  • Customer-focused mindset.
  • Strong bias toward efficiency and automation over manual processes.
  • Experience using and extending Kubernetes or similar containerization technologies.
  • Experience with cloud infrastructure platforms such as AWS, Google Cloud Platform (GCP), or Azure.
  • Understanding of Linux internals and networking concepts including TCP/IP, DNS, TLS, and routing.
  • Preferred: Experience leading major architectural shifts from legacy storage stacks to multi-tenant storage architectures.
  • Preferred: Experience planning and executing large-scale data and workload migrations with tight availability and durability requirements.
  • Preferred: Experience managing and scaling infrastructure across multi-cloud environments.
  • Preferred: Experience designing secure, multi-tenant runtime environments at scale.

Benefits

  • Base salary range of $144,000 to $248,000 USD for U.S.-based candidates.
  • Equity and participation in the employee stock purchase program.
  • Flexible paid time off.
  • 20 weeks of fully paid gender-neutral parental leave.
  • Fertility and adoption assistance.
  • 401(k) plan.
  • Mental health counseling.
  • Access to transgender-inclusive health insurance coverage and other health benefits.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Cloud Reliability & Recovery Engineer

AlphaSense 251-1K Internet Software & Services

AlphaSense is hiring a Senior Cloud Engineer to build and operate AWS-based business continuity and disaster recovery capabilities that protect mission-critical systems and enable rapid recovery from disruptions.

API Gateway Argo CD AWS Bash CI/CD CodeBuild CodePipeline DNS DynamoDB GitHub Actions GitOps HIPAA Kubernetes OpsGenie PagerDuty PowerShell Python Serverless Terraform
21 minutes ago

Site Reliability Engineer

Arbor 51-250 IT Services

Arbor is hiring a Remote Site Reliability Engineer to help ensure platform resilience, performance, availability, and scalable service delivery across its school management systems.

Agile Datadog Docker Kanban Nginx Prometheus Terraform
3 hours, 11 minutes ago

Senior Database Reliability Engineer

Sezzle 251-1K Diversified Financial Services

Sezzle is hiring a Senior Database Reliability Engineer to design, build, and scale the shared database platform and reliability controls that support its applications across production and development environments.

AWS CI/CD Datadog Elasticsearch Encryption Git Go Grafana Helm Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python React React Native REST API Secrets Management Terraform TypeScript
3 hours, 43 minutes ago

Senior Database Reliability Engineer

Sezzle 251-1K Diversified Financial Services

Sezzle is hiring a Senior Database Reliability Engineer to design and scale the database platform behind its applications, with a focus on making database usage safer, more reliable, and easier for developers across the company.

AWS CI/CD Datadog Elasticsearch Encryption Git GitLab Go Grafana Helm Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python React React Native Secrets Management Terraform TypeScript
9 hours, 22 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers