Everbridge

Everbridge provides a comprehensive software platform that automates and enhances organizations' responses to critical events, ensuring the safety of individuals and the continuity of business operations during emergencies such as natural disasters, cy...

Internet Software & Services

Information Technology

1K-5K (1713)

Founded 2002

25 open positions

Links

View All Jobs

Staff Platform Site Reliability Specialist (Observability & Kubernetes) (copy)

12 hours, 6 minutes ago

Canada

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

AWS GCP Grafana Kubernetes Terraform

Apply Now

Everbridge

Internet Software & Services

1K-5K

Founded 2002

View All Jobs 25

Description

Own the design, operation, and ongoing evolution of Everbridge’s observability stack.
Build and maintain a highly available, scalable observability platform.
Standardize instrumentation, dashboards, alerts, and SLOs across engineering teams.
Support incident response, root cause analysis, and capacity planning.
Operate and scale Grafana and related telemetry services, including Loki, Mimir, Tempo, and Alerting.
Maintain the reliability and security of EKS clusters running the observability platform.
Manage Kubernetes cluster lifecycle activities, including upgrades.
Use Terraform to provision infrastructure as code.
Support automation and CI/CD workflows using HashiCorp Packer and GitLab CI/CD.
Collaborate professionally with other teams to keep systems running smoothly and move work forward.

Requirements

6+ years of experience in SRE or Platform Engineering.
Strong experience with the Grafana ecosystem.
Experience with Kubernetes and Amazon EKS.
Proficiency with Terraform.
Experience working with cloud technologies in AWS and GCP.
Familiarity with observability tooling such as Grafana Loki, Grafana Mimir, Grafana Tempo, and Grafana Alerting (preferred).
Experience with HashiCorp Packer and GitLab CI/CD at scale (preferred).
Ability to communicate clearly, collaborate effectively, and work respectfully with cross-functional teams.
Comfort supporting incident response and reliability-focused operations.
Experience in large-scale, cloud-native environments (preferred).

Benefits

Salary range of CAD $135,000 to $165,000, with possible variable compensation.
Comprehensive healthcare and dental care.
Mental health benefits.
Disability income benefits.
Life and AD&D insurance.
Retirement savings plan with employer match.
Paid time off.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

TextNow 51-250 Wireless Telecommunication Services

TextNow is hiring a remote Site Reliability Engineer in Canada to own infrastructure, monitoring, logging, CI/CD, and reliability for the systems supporting its free phone service platform.

Canada Full-time Senior Site Reliability Engineer (SRE)

$113k-$212k

Ansible AWS CI/CD GitHub System Design Terraform

4 hours, 21 minutes ago

Apply

4 hours, 21 minutes ago

Senior Application Engineer

Warner Music Group 5K-10K Media

Warner Music Group is hiring a Senior Application Engineer to support, improve, and modernize the software systems behind its global music operations.

Canada Full-time Senior Site Reliability Engineer (SRE) Software Engineer

$100k-$145k

Angular AWS CI/CD GitHub Actions Java Oracle PostgreSQL Python React SQL

4 hours, 36 minutes ago

Apply

4 hours, 36 minutes ago

Site Reliability Engineer - Backstage

Spotify Media

Site Reliability Engineer for Spotify’s Backstage team in New York City, focused on building and operating cloud infrastructure for an external developer portal and internal AI-driven coding workflows.

United States Full-time Mid Level Site Reliability Engineer (SRE)

$133k-$190k

AWS GCP Go Java LLM Microservices Python React Terraform TypeScript

5 hours, 51 minutes ago

Apply

5 hours, 51 minutes ago

Blockchain Site Reliability Engineer

InfStones 51-250 Internet Software & Services

InfStones is hiring a remote Blockchain Site Reliability Engineer in Dallas to ensure the reliability, availability, and performance of its blockchain node infrastructure.

United States Contract Senior Site Reliability Engineer (SRE)

Docker Ethereum Go Grafana JavaScript Kubernetes Linux Prometheus Python Rust Solana

6 hours, 36 minutes ago

Apply

6 hours, 36 minutes ago

Everbridge

Tags

Links

Staff Platform Site Reliability Specialist (Observability & Kubernetes) (copy)

Everbridge

Description

Requirements

Benefits

Similar Roles

Site Reliability Engineer

Senior Application Engineer

Site Reliability Engineer - Backstage

Blockchain Site Reliability Engineer

You're on a roll! Sign up now to keep applying.