Everbridge

Everbridge provides a comprehensive software platform that automates and enhances organizations' responses to critical events, ensuring the safety of individuals and the continuity of business operations during emergencies such as natural disasters, cy...

Internet Software & Services

Information Technology

1K-5K (1713)

Founded 2002

23 open positions

Links

View All Jobs

Staff Platform Site Reliability Specialist (Observability & Kubernetes)

2 hours, 30 minutes ago

Canada

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

AWS GCP Grafana Kubernetes Terraform

Apply Now

Everbridge

Internet Software & Services

1K-5K

Founded 2002

View All Jobs 23

Description

Own the design, operation, and evolution of Everbridge’s observability stack.
Build and maintain a highly available and scalable observability platform.
Standardize instrumentation, dashboards, alerts, and service level objectives (SLOs).
Support incident response, root cause analysis, and capacity planning.
Operate and scale the Grafana ecosystem, including Grafana Loki, Mimir, Tempo, and Alerting.
Maintain the reliability and security of EKS clusters that support the observability platform.
Manage Kubernetes cluster lifecycle activities, including upgrades.
Provision infrastructure using Terraform.
Support infrastructure automation with HashiCorp Packer and GitLab CI/CD at scale.
Collaborate professionally with other teams to keep work moving forward and build trust.

Requirements

6+ years of experience in SRE or Platform Engineering.
Strong experience with the Grafana ecosystem.
Hands-on experience with Kubernetes and Amazon EKS.
Proficiency with Terraform.
Experience working with AWS and GCP cloud technologies.
Experience with infrastructure provisioning and automation tools such as HashiCorp Packer and GitLab CI/CD (preferred).
Ability to communicate clearly and collaborate effectively across teams.
Comfort working in a highly visible, enterprise-scale cloud-native environment.

Benefits

Salary range of CAD $135,000 to $165,000, with possible variable compensation.
Comprehensive healthcare and dental care benefits.
Mental health benefits.
Disability income benefits.
Life and AD&D insurance.
Retirement savings plan with employer match.
Paid time off.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Lead Engineer - Platform Performance & Reliability

HighLevel 251-1K Internet Software & Services

HighLevel is hiring a Lead Engineer for its Platform Performance & Reliability team to improve the speed, stability, and operational health of a high-traffic global SaaS platform.

India Full-time Senior Backend Engineer Site Reliability Engineer (SRE)

AWS ClickHouse Firestore GCP Grafana Kubernetes Microservices MongoDB MySQL Node.js OpenTelemetry PostgreSQL Prometheus Redis

15 minutes ago

Apply

15 minutes ago

Senior Cluster Site Reliability Engineer

The Voleon Group 51-250 Capital Markets

Senior Cluster Site Reliability Engineer at Voleon, responsible for scaling and operating the company’s research compute cluster that supports machine learning research and investment management workloads across on-prem and cloud environments.

United States Full-time Senior Site Reliability Engineer (SRE)

$205k-$235k

Ansible Apache Airflow Apache Spark AWS Docker GCP Grafana Kubeflow Kubernetes Machine Learning OpenTelemetry Podman Prometheus Python PyTorch Ruby TensorFlow Terraform

30 minutes ago

Apply

30 minutes ago

Ingénieur fiabilité des infrastructures

Tecsys 251-1K Air Freight & Logistics

Tecsys recherche un ingénieur fiabilité des infrastructures pour son NOC afin d’assurer la fiabilité, la performance et l’évolution de ses plateformes SaaS critiques sur AWS et Kubernetes.

Canada Full-time Senior Site Reliability Engineer (SRE)

AWS Datadog Kubernetes Terraform

45 minutes ago

Apply

45 minutes ago

Sr. Site Reliability Engineer (Remote, Mexico)

Nova: Onshore and Nearshore Engineering Solutions Internet Software & Services

IO Connect Services is seeking a remote Senior Site Reliability Engineer in Mexico to help design, automate, and scale cloud infrastructure and production services for customer deployments across a LATAM engineering team.

Mexico Full-time Senior Site Reliability Engineer (SRE)

Ansible AWS Azure C++ Chef CI/CD Datadog GCP HDFS Java JavaScript Kubernetes PowerShell Puppet Python Ruby Terraform

1 hour ago

Apply

1 hour ago

Everbridge

Tags

Links

Staff Platform Site Reliability Specialist (Observability & Kubernetes)

Everbridge

Description

Requirements

Benefits

Similar Roles

Lead Engineer - Platform Performance & Reliability

Senior Cluster Site Reliability Engineer

Ingénieur fiabilité des infrastructures

Sr. Site Reliability Engineer (Remote, Mexico)

You're on a roll! Sign up now to keep applying.