Staff Platform Site Reliability Specialist (Observability & Kubernetes) (copy)

2 hours, 11 minutes ago
Remote
Full-time
Lead
DevOps and Infrastructure
Everbridge

Everbridge

Everbridge provides a comprehensive software platform that automates and enhances organizations' responses to critical events, ensuring the safety of individuals and the continuity of business operations during emergencies such as natural disasters, cy...

Internet Software & Services
1K-5K
Founded 2002

Description

  • Own the design, operation, and evolution of Everbridge’s observability stack.
  • Build and maintain a highly available, scalable observability platform.
  • Standardize instrumentation, dashboards, alerts, and SLOs.
  • Support incident response, root cause analysis, and capacity planning.
  • Operate and scale Grafana and its telemetry stack, including Loki, Mimir, Tempo, and Alerting.
  • Maintain the reliability and security of EKS clusters that run observability services.
  • Manage Kubernetes cluster lifecycle activities, including upgrades.
  • Provision infrastructure using Terraform and automate platform workflows.
  • Support CI/CD at scale using GitLab CI/CD.
  • Work across AWS and GCP cloud technologies to support platform reliability.

Requirements

  • 6+ years of experience in SRE or Platform Engineering.
  • Strong experience with the Grafana ecosystem.
  • Hands-on expertise with Kubernetes and Amazon EKS.
  • Proficiency with Terraform.
  • Experience operating observability platforms in cloud-native environments.
  • Experience with infrastructure as code and platform automation tools such as Packer and GitLab CI/CD.
  • Ability to collaborate clearly and professionally with cross-functional teams.
  • Comfort working in a team-oriented environment and supporting others without ego.

Benefits

  • Salary range of CAD $135,000–$165,000, with potential variable compensation.
  • Comprehensive healthcare benefits.
  • Dental care coverage.
  • Mental health benefits.
  • Disability income benefits.
  • Life and AD&D insurance.
  • Retirement savings plan with employer match.
  • Paid time off.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Database Reliability Engineer

PointClickCare 1K-5K Health Care Providers & Services

PointClickCare is hiring a Senior Database Reliability Engineer to manage and improve the cloud database infrastructure behind its mission-critical SaaS platform.

Ansible AWS Azure C# Databricks GCP Git Grafana InfluxDB JIRA MySQL PostgreSQL PowerShell Python SQL SQL Server Terraform
11 minutes ago

Site Reliability Engineer

SwissBorg 51-250 Capital Markets

SwissBorg is hiring a Site Reliability Engineer to support and scale its cloud infrastructure and operations for a fast-growing crypto investment platform.

Ansible Argo CD AWS CI/CD DNS Git GitLab GitOps Grafana Kafka Kubernetes OpenSearch OpenTelemetry PostgreSQL Prometheus Terraform
26 minutes ago

LiveOps Engineer

Civica 1K-5K Internet Software & Services

Civica is seeking a LiveOps Engineer to help operate and improve its cloud and production environments that support critical public services for citizens worldwide.

Ansible AWS Azure Bash CI/CD Datadog DNS Docker Elasticsearch Git GitHub Actions Go Grafana Helm Jenkins Kubernetes Load Balancing PowerShell Prometheus Python Terraform
1 hour, 41 minutes ago

Senior SRE Engineer

Stellar Cyber 51-250 Professional Services

Stellar Cyber is hiring a Senior Site Reliability Engineer to strengthen the reliability, scalability, and operational excellence of its cloud-based cybersecurity platform.

Apache Spark Argo CD AWS Azure Bash Bitbucket CI/CD Elasticsearch GCP GitHub Actions Grafana Helm Kafka Kubernetes MongoDB Prometheus Python Redis Terraform
2 hours, 41 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers