Staff Site Reliability Engineer, Fabric

4 hours, 53 minutes ago
Full-time
Lead
DevOps and Infrastructure
MongoDB

MongoDB

MongoDB provides a developer data platform that simplifies data management and accelerates application development, enabling businesses to leverage modern database technology for innovative solutions across various industries.

Internet Software & Services
1K-5K
Founded 2007

Description

  • Develop and maintain a reliable, resilient multi-cloud network that supports MongoDB services.
  • Own infrastructure for secure communication between systems and between internal services and the public internet.
  • Collaborate with service-owning teams to troubleshoot technical issues and advise on best practices for service-to-service connectivity.
  • Participate in a 24/7 on-call rotation to resolve network architecture and connectivity incidents quickly.
  • Help design and operate network architecture, service mesh, and edge load balancing systems.
  • Support the broader engineering organization with critical infrastructure and operational functions.
  • Drive automation and process efficiency to reduce manual operational work.
  • Contribute to observability, alerting, and deployment-related infrastructure as part of Platform Engineering.

Requirements

  • 10+ years of experience working on software and operating distributed systems.
  • Deep expertise in networking fundamentals and internet protocols, including TCP/IP, IPv6, DNS, TLS/mTLS, BGP, tunnels, overlays, and SDN principles.
  • Strong understanding of how the internet works.
  • Experience with at least one major cloud provider: AWS, Azure, or GCP.
  • Familiarity with cloud network design primitives such as VPCs, subnetting, routing, VPNs, peering, PrivateLink / Private Service Connect, and CDNs.
  • Strong knowledge of service mesh and load-balancing concepts.
  • Experience implementing service mesh and load balancing in a multi-cloud environment.
  • Customer-focused mindset with a focus on end-user impact.
  • Strong preference for automation over manual operational processes.
  • Ability to participate in a 24/7 on-call rotation.

Benefits

  • Base salary range in Canada of $144,000 to $200,000 CAD.
  • Equity as part of the total compensation package.
  • Employee stock purchase program.
  • Flexible paid time off.
  • 20 weeks of fully paid gender-neutral parental leave.
  • Fertility and adoption assistance.
  • Registered Retirement Savings Plan (RRSP) with employer match.
  • Mental health counseling, backup child and elder care, and health, dental, and vision benefits.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Site Reliability Engineer, Database

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its large-scale brokerage infrastructure reliable, scalable, and high-performing across database and production systems.

Go Linux PostgreSQL Prometheus
3 hours, 8 minutes ago

Senior Site Reliability Engineer I

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Site Reliability Engineer I in San Jose, Costa Rica (remote) to own the availability and operational excellence of its planet-scale observability and security products.

Ansible AWS CI/CD Go Java Jenkins Kafka Kanban Kubernetes Linux Microservices Python Scala Scrum Terraform
3 hours, 21 minutes ago

Assoc, Protocol Engineer (Chainlink)

Galaxy 251-1K Capital Markets

Galaxy is hiring an experienced Protocol, DevOps, or SRE Engineer to help build and operate secure blockchain infrastructure supporting its digital assets platform and custody offerings.

AWS Azure Bash Blockchain C C++ Datadog Docker ELK Stack Encryption Ethereum GCP Go Grafana Java Kubernetes Linux Network Security Perl Prometheus Python Rust Solana Terraform
3 hours, 38 minutes ago

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve the security, reliability, and operational efficiency of its cloud-based platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Penetration Testing Ruby Scala SIEM
5 hours, 38 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers