Everbridge

Everbridge provides a comprehensive software platform that automates and enhances organizations' responses to critical events, ensuring the safety of individuals and the continuity of business operations during emergencies such as natural disasters, cy...

Internet Software & Services

Information Technology

1K-5K (1713)

Founded 2002

5 open positions

Links

View All Jobs

Site Reliability Specialist (Observability & Kubernetes)

18 hours, 58 minutes ago

United States

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

AWS GCP Grafana Kubernetes OpenTelemetry Terraform

Apply Now

Everbridge

Internet Software & Services

1K-5K

Founded 2002

View All Jobs 5

Description

Own the design, operation, and evolution of the observability platform.
Build and maintain a highly available and scalable observability stack.
Standardize instrumentation, dashboards, alerts, and SLOs across engineering teams.
Support incident response, root cause analysis, and capacity planning.
Operate and scale Grafana and its telemetry components, including Loki, Mimir, Tempo, and Alerting.
Maintain the reliability and security of EKS clusters supporting observability services.
Manage Kubernetes cluster lifecycle activities, including upgrades.
Provision infrastructure using Terraform and automate platform operations.
Work with GitLab CI/CD pipelines at scale.
Collaborate across teams in AWS and GCP environments.

Requirements

6+ years of experience in SRE or Platform Engineering.
Strong experience with the Grafana ecosystem.
Hands-on expertise with Kubernetes and Amazon EKS.
Proficiency with Terraform.
Experience with OpenTelemetry (preferred).
Experience operating large-scale observability systems (preferred).
Cost optimization experience (preferred).
Experience with infrastructure provisioning and automation tools such as HashiCorp Packer and GitLab CI/CD.
Ability to work remotely in the United States.
Strong communication, collaboration, and professionalism with cross-functional teams.

Benefits

Salary range of $118,700 to $145,000 per year, plus possible variable compensation.
Healthcare and dental coverage.
Parental planning benefits.
Mental health benefits.
Disability income benefits.
Life and AD&D insurance.
401(k) plan with company match.
Paid time off and fitness reimbursements.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer (Senior or Staff), Atlas

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Senior Site Reliability Engineer for its Atlas team to help support, maintain, and grow a multi-cloud platform for customer-facing production workloads.

United States Full-time Senior Site Reliability Engineer (SRE)

$127k-$249k

AWS Azure DNS GCP Go HTTP Linux Python Ruby TLS

3 hours, 53 minutes ago

Apply

3 hours, 53 minutes ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is seeking an Engineering Manager to lead its Resilience Engineering team, building production load testing and chaos engineering capabilities that improve the safety and reliability of production systems.

United States Full-time Lead Engineering Manager Site Reliability Engineer (SRE)

$200k-$275k

AWS Java Kotlin Kubernetes Microservices Python

4 hours, 2 minutes ago

Apply

4 hours, 2 minutes ago

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

MongoDB 1K-5K Internet Software & Services

MongoDB’s Storage Layer Services team is hiring a Site Reliability Engineer to help re-architect the cloud storage layer for Atlas and ensure the reliability and operational safety of its distributed storage infrastructure.

United States Full-time Senior Site Reliability Engineer (SRE)

$144k-$248k

AWS Azure DNS GCP Go Kubernetes Linux Python TCP/IP TLS

4 hours, 50 minutes ago

Apply

4 hours, 50 minutes ago

Manager, Software Engineering (Resilience Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring an Engineering Manager to lead its Resilience Engineering team in building production load testing and chaos engineering capabilities that improve the safety and reliability of its production systems.

Canada Full-time Lead Engineering Manager Site Reliability Engineer (SRE)

$178k-$228k

AWS Java Kotlin Kubernetes Python

7 hours, 6 minutes ago

Apply

7 hours, 6 minutes ago

Everbridge

Tags

Links

Site Reliability Specialist (Observability & Kubernetes)

Everbridge

Description

Requirements

Benefits

Similar Roles

Site Reliability Engineer (Senior or Staff), Atlas

Manager, Software Engineering (Resilience Engineering)

Site Reliability Engineer (Senior or Staff), Storage Layer Services (SLS)

Manager, Software Engineering (Resilience Engineering)

You're on a roll! Sign up now to keep applying.