Obsidian Security

Obsidian Security is a Southern California-based company at the forefront of cybersecurity, artificial intelligence, and hybrid cloud environments. They offer a comprehensive security solution for businesses, including advanced threat protection, insid...

Internet Software & Services

Information Technology

51-250 (150)

Founded 2017

$30M raised

19 open positions

Links

View All Jobs

Sr. Staff Site Reliability Engineer

2 months, 1 week ago

United States

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Argo CD AWS GCP Grafana Helm Kubernetes Microservices Prometheus

Apply Now

Obsidian Security

Internet Software & Services

51-250

Founded 2017

$30M raised

View All Jobs 19

Description

Define and drive the company-wide reliability strategy across services.
Establish end-to-end system visibility frameworks for observability, detection, and resilience.
Partner with DevOps and Platform Engineering leadership to standardize SLI/SLOs and improve reliability practices across teams.
Serve as a technical escalation expert for reliability issues and incident response.
Build intelligent detection systems, including anomaly detection and connector health models.
Enable self-service observability for engineering teams.
Define and evolve a tiered incident communication strategy.
Lead postmortems and improve incident response practices to strengthen customer trust.
Contribute hands-on to system design, monitoring, and debugging across distributed systems and data pipelines.

Requirements

5+ years of experience in SRE, Production Engineering, or a related role.
3+ years of experience operating at a senior or technical leadership level, such as Staff scope or equivalent.
Deep expertise with AWS and/or GCP.
Experience with Kubernetes and Helm.
Experience with observability stacks such as Prometheus and Grafana, or equivalent tools.
Experience with CI/CD systems such as GitLab CI/CD and ArgoCD, or similar tools.
Proven experience designing and scaling reliability systems for multi-tenant SaaS platforms.
Strong debugging and systems thinking across distributed microservices and legacy systems.
Demonstrated ability to lead initiatives that improve incident detection, response, and system resilience.
Hands-on engineering approach with a track record of building reliability systems, not just configuring them.
Experience in B2B SaaS serving enterprise or financial customers, preferred.
Familiarity with third-party SaaS connector architectures and ingestion patterns, preferred.
Experience building anomaly detection or intelligent alerting systems, preferred.
Experience designing customer-facing status pages and incident communication frameworks, preferred.

Benefits

Competitive compensation with equity and 401(k).
Comprehensive healthcare with dental and vision coverage.
Flexible paid time off and paid holiday time off.
12 weeks of new parent or family leave.
Personal and professional development resources.
Base salary range of $232,000 to $263,000 USD.
Eligibility for equity awards and possible sales commission or incentive compensation, depending on role or function.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Counterpart Health 51-200 hospital & health care

Counterpart Health is hiring a Senior Site Reliability and Infrastructure Engineer to support and evolve the technology platform behind its primary care tool and maintain reliable infrastructure for domestic and international workloads.

United States Full-time Senior Site Reliability Engineer (SRE)

$160k-$208k

AWS Azure CI/CD Containerd DNS Docker GCP Go gRPC Helm Kubernetes Linux Load Balancing Prometheus Python Shell Scripting TCP/IP

1 day, 20 hours ago

Apply

1 day, 20 hours ago

Senior Test Platform & Reliability Engineer - Star Trek Fleet Command

Scopely 1K-5K Internet Software & Services

Scopely is hiring a Senior Test Platform & Reliability Engineer in Ireland to build validation, reliability, and developer enablement platforms for Star Trek Fleet Command’s large-scale live-service backend systems.

Ireland Full-time Senior SDET (Software Development Engineer in Test) Site Reliability Engineer (SRE)

AWS Bash CI/CD Docker GitLab Go Python Terraform

1 day, 21 hours ago

Apply

1 day, 21 hours ago

Senior Software Engineer - Databases, SRE | Canada | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Senior Software Engineer for its remote SRE team to improve reliability and operability of Grafana Cloud database services for high-SLA customers across AWS, GCP, and Azure.

Canada Full-time Senior Site Reliability Engineer (SRE) Software Engineer

$108k-$130k

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform

2 days, 20 hours ago

Apply

2 days, 20 hours ago

Senior Site Reliability Engineer

Semios 51-250 Food Products

Semios Group is hiring a Senior Site Reliability Engineer to help scale, secure, and improve the reliability of its global agricultural technology platform.

Canada Full-time Senior Site Reliability Engineer (SRE)

$140k-$160k

AWS Azure Bash Buildkite CI/CD Datadog Docker Envoy GCP Git GitHub GitHub Actions GitLab Go Jenkins Kubernetes Linux NATS New Relic Prometheus Python Ruby Splunk Terraform

2 days, 21 hours ago

Apply

2 days, 21 hours ago

Obsidian Security

Tags

Links

Sr. Staff Site Reliability Engineer

Obsidian Security

Description

Requirements

Benefits

Similar Roles

Senior Site Reliability Engineer

Senior Test Platform & Reliability Engineer - Star Trek Fleet Command

Senior Software Engineer - Databases, SRE | Canada | Remote

Senior Site Reliability Engineer

You're on a roll! Sign up now to keep applying.