Obsidian Security

Obsidian Security is a Southern California-based company at the forefront of cybersecurity, artificial intelligence, and hybrid cloud environments. They offer a comprehensive security solution for businesses, including advanced threat protection, insid...

Internet Software & Services

Information Technology

51-250 (150)

Founded 2017

$30M raised

16 open positions

Links

View All Jobs

Site Reliability Engineering (SRE) Tech Lead

2 hours, 3 minutes ago

United States

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Argo CD AWS GCP Grafana Helm Kubernetes Microservices Prometheus

Apply Now

Obsidian Security

Internet Software & Services

51-250

Founded 2017

$30M raised

View All Jobs 16

Description

Map and instrument critical system paths for top-tier enterprise customers.
Build connector health models to distinguish internal defects, upstream SaaS outages, and expected sparse or low-signal scenarios.
Establish tiered incident communication, including a public status page and direct outreach for high-priority accounts.
Define and roll out SLI/SLO standards across microservices.
Develop self-service instrumentation tooling so engineering teams can own observability.
Implement baseline-aware anomaly detection across connectors beyond static thresholds.
Mature incident response processes through structured post-mortems and continuous reliability improvements.
Lead a unified reliability strategy in partnership with DevOps and Platform Engineering leads.
Architect and implement systems for monitoring complex, mission-critical SaaS workloads.

Requirements

7+ years of experience in SRE, production engineering, or a similar role.
2+ years of experience operating as a technical lead.
Deep expertise with AWS and/or GCP.
Experience with Kubernetes and Helm.
Experience with observability tools such as Prometheus and Grafana.
Experience with CI/CD systems such as GitLab CI/CD and ArgoCD.
Proven experience building monitoring for multi-tenant SaaS systems with complex data pipelines.
Strong debugging skills across distributed microservices and legacy systems.
Hands-on engineering mindset with the ability to instrument services directly, not just configure tooling.
Track record of building or significantly improving incident detection and response systems.
Experience in B2B SaaS serving enterprise or financial customers is preferred.
Familiarity with third-party SaaS connector ingestion patterns is preferred.
Experience building anomaly detection systems or baseline-aware alerting is preferred.
Experience implementing customer-facing status pages and incident communication frameworks is preferred.

Benefits

Competitive compensation with equity and 401(k).
Comprehensive healthcare with dental and vision coverage.
Flexible paid time off and paid holiday time off.
12 weeks of new parent or family leave.
Personal and professional development resources.
Base salary range of $250,000 to $280,000 USD.
Eligible for equity awards and may be eligible for sales commission or incentive compensation.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Member of Technical Staff, Fleet Reliability

Pure Storage 1K-5K IT Services

Pure Storage is hiring a Forensics Software Engineer to own fleet reliability and build investigative and predictive solutions that help diagnose customer issues and protect globally distributed systems.

India Full-time Entry Level Site Reliability Engineer (SRE)

C++ Go Java Linux Python

18 minutes ago

Apply

18 minutes ago

Senior Software Engineer - Search Platform

Algolia 251-1K Internet Software & Services

Algolia is hiring a Senior Software Engineer to join the Metis team and help build and operate the cloud-based distributed architecture behind its NeuralSearch AI search engine.

France United Kingdom Full-time Senior Site Reliability Engineer (SRE) Software Engineer

Go Kubernetes

1 hour, 33 minutes ago

Apply

1 hour, 33 minutes ago

Staff Site Reliability Engineer

Alphasense 51-250 Industrial Conglomerates

AlphaSense is hiring a Staff Site Reliability Engineer to shape reliability, scalability, and performance for its AI-driven market intelligence platform and global engineering organization.

United States Full-time Lead Site Reliability Engineer (SRE)

$150k-$225k

AWS Azure Datadog DNS GCP Go Grafana Kubernetes Load Balancing OpenTelemetry Prometheus Python TCP/IP

3 hours, 3 minutes ago

Apply

3 hours, 3 minutes ago

Staff Site Reliability Engineer

Alphasense 51-250 Industrial Conglomerates

AlphaSense is hiring a Staff Site Reliability Engineer to architect and scale reliability, observability, and incident-response practices for its global SaaS platform.

India Full-time Lead Site Reliability Engineer (SRE)

AWS Azure Datadog DNS GCP Go Grafana Kubernetes Load Balancing OpenTelemetry Prometheus Python TCP/IP

4 hours, 18 minutes ago

Apply

4 hours, 18 minutes ago

Obsidian Security

Tags

Links

Site Reliability Engineering (SRE) Tech Lead

Obsidian Security

Description

Requirements

Benefits

Similar Roles

Member of Technical Staff, Fleet Reliability

Senior Software Engineer - Search Platform

Staff Site Reliability Engineer

Staff Site Reliability Engineer

You're on a roll! Sign up now to keep applying.