Staff Site Reliability Engineer, Fabric

3 weeks, 1 day ago
Full-time
Lead
DevOps and Infrastructure
MongoDB

MongoDB

MongoDB provides a developer data platform that simplifies data management and accelerates application development, enabling businesses to leverage modern database technology for innovative solutions across various industries.

Internet Software & Services
1K-5K
Founded 2007

Description

  • Develop and maintain a reliable, resilient multi-cloud network that supports MongoDB services.
  • Own infrastructure for secure communication between systems and between internal services and the public internet.
  • Collaborate with service-owning teams to troubleshoot technical issues and advise on best practices for service-to-service connectivity.
  • Participate in a 24/7 on-call rotation to resolve network architecture and connectivity incidents quickly.
  • Help design and operate network architecture, service mesh, and edge load balancing systems.
  • Support the broader engineering organization with critical infrastructure and operational functions.
  • Drive automation and process efficiency to reduce manual operational work.
  • Contribute to observability, alerting, and deployment-related infrastructure as part of Platform Engineering.

Requirements

  • 10+ years of experience working on software and operating distributed systems.
  • Deep expertise in networking fundamentals and internet protocols, including TCP/IP, IPv6, DNS, TLS/mTLS, BGP, tunnels, overlays, and SDN principles.
  • Strong understanding of how the internet works.
  • Experience with at least one major cloud provider: AWS, Azure, or GCP.
  • Familiarity with cloud network design primitives such as VPCs, subnetting, routing, VPNs, peering, PrivateLink / Private Service Connect, and CDNs.
  • Strong knowledge of service mesh and load-balancing concepts.
  • Experience implementing service mesh and load balancing in a multi-cloud environment.
  • Customer-focused mindset with a focus on end-user impact.
  • Strong preference for automation over manual operational processes.
  • Ability to participate in a 24/7 on-call rotation.

Benefits

  • Base salary range in Canada of $144,000 to $200,000 CAD.
  • Equity as part of the total compensation package.
  • Employee stock purchase program.
  • Flexible paid time off.
  • 20 weeks of fully paid gender-neutral parental leave.
  • Fertility and adoption assistance.
  • Registered Retirement Savings Plan (RRSP) with employer match.
  • Mental health counseling, backup child and elder care, and health, dental, and vision benefits.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its brokerage platform reliable and operable across cloud, Kubernetes, observability, messaging, and database systems, with a strong focus on PostgreSQL reliability on the trading-critical path.

DNS GitOps Go Kafka Kubernetes Linux Load Balancing PostgreSQL Python RabbitMQ Secrets Management TLS
2 hours, 5 minutes ago

Site Reliability Engineer

Kaseya 1K-5K IT Services

Kaseya is hiring a Site Reliability Engineer to own the reliability, automation, and production stability of its AWS-based services used by thousands of MSPs worldwide.

Ansible AWS Chef CloudFormation Datadog DevSecOps Elasticsearch Kibana Kubernetes MySQL PostgreSQL Puppet Secrets Management Serverless Terraform
6 hours, 4 minutes ago

SRE - DevOps Engineer - Argentina

Coderio 51-250 Internet Software & Services

Coderio is hiring a remote DevOps/SRE Engineer in Argentina to ensure the stability, scalability, and efficient operation of the infrastructure that supports its global digital solutions.

Argo CD CI/CD Flux GitHub Actions GitOps Helm Jenkins Kubernetes OpenShift Terraform
9 hours, 44 minutes ago

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript
17 hours, 17 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers