Staff Site Reliability Engineer, Fabric

2 hours, 13 minutes ago
Full-time
Lead
DevOps and Infrastructure
MongoDB

MongoDB

MongoDB provides a developer data platform that simplifies data management and accelerates application development, enabling businesses to leverage modern database technology for innovative solutions across various industries.

Internet Software & Services
1K-5K
Founded 2007

Description

  • Develop and maintain a reliable, resilient, globally connected multi-cloud network for MongoDB’s services.
  • Own infrastructure for secure communication between systems, including network architecture, service mesh, and edge load balancing.
  • Collaborate with service-owning teams to provide internal support and guidance on service-to-service connectivity best practices.
  • Investigate and resolve technical issues related to network architecture and connectivity.
  • Participate in a 24/7 on-call rotation to restore service quickly and minimize disruption.
  • Contribute to observability, alerting, and deployment infrastructure as part of the broader Platform Engineering organization.
  • Drive automation and process improvements to reduce manual operational work.
  • Support secure data transmission and high availability across a multi-cloud environment.

Requirements

  • 10+ years of experience working on software and operating distributed systems.
  • Deep expertise in networking fundamentals, including TCP/IP, IPv6, DNS, TLS/mTLS, BGP, tunnels, overlays, and SDN principles.
  • Strong understanding of how the internet works at a protocol and infrastructure level.
  • Experience with at least one major cloud provider: AWS, Azure, or GCP.
  • Knowledge of cloud network design primitives such as VPCs, subnetting, routing, VPNs, peering, private link / private service connect, and CDNs.
  • Strong knowledge of service mesh and load-balancing concepts.
  • Experience working in or supporting multi-cloud infrastructure environments.
  • Customer-focused mindset with a preference for automation over manual processes.
  • Ability to participate in a 24/7 on-call rotation.
  • Preferred: experience implementing service mesh and load balancing in a multi-cloud environment.

Benefits

  • Base salary range of $127,000–$249,000 USD for U.S.-based candidates.
  • Equity as part of the total compensation package.
  • Employee stock purchase program.
  • Flexible paid time off.
  • 20 weeks of fully paid gender-neutral parental leave.
  • Fertility and adoption assistance.
  • 401(k) plan.
  • Mental health counseling and transgender-inclusive health insurance coverage.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Site Reliability Engineer, Database

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its large-scale brokerage infrastructure reliable, scalable, and high-performing across database and production systems.

Go Linux PostgreSQL Prometheus
2 hours, 13 minutes ago

Senior Site Reliability Engineer I

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Site Reliability Engineer I in San Jose, Costa Rica (remote) to own the availability and operational excellence of its planet-scale observability and security products.

Ansible AWS CI/CD Go Java Jenkins Kafka Kanban Kubernetes Linux Microservices Python Scala Scrum Terraform
2 hours, 26 minutes ago

Assoc, Protocol Engineer (Chainlink)

Galaxy 251-1K Capital Markets

Galaxy is hiring an experienced Protocol, DevOps, or SRE Engineer to help build and operate secure blockchain infrastructure supporting its digital assets platform and custody offerings.

AWS Azure Bash Blockchain C C++ Datadog Docker ELK Stack Encryption Ethereum GCP Go Grafana Java Kubernetes Linux Network Security Perl Prometheus Python Rust Solana Terraform
2 hours, 43 minutes ago

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve the security, reliability, and operational efficiency of its cloud-based platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Penetration Testing Ruby Scala SIEM
4 hours, 43 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers