Sr. Staff Site Reliability Engineer

1 day, 20 hours ago
Full-time
Lead
DevOps and Infrastructure
Obsidian Security

Obsidian Security

Obsidian Security is a Southern California-based company at the forefront of cybersecurity, artificial intelligence, and hybrid cloud environments. They offer a comprehensive security solution for businesses, including advanced threat protection, insid...

Internet Software & Services
51-250
Founded 2017
$30M raised

Description

  • Define and drive the company-wide reliability strategy across services.
  • Establish end-to-end system visibility frameworks for observability, detection, and resilience.
  • Partner with DevOps and Platform Engineering leadership to standardize SLI/SLOs and improve reliability practices across teams.
  • Serve as a technical escalation expert for reliability issues and incident response.
  • Build intelligent detection systems, including anomaly detection and connector health models.
  • Enable self-service observability for engineering teams.
  • Define and evolve a tiered incident communication strategy.
  • Lead postmortems and improve incident response practices to strengthen customer trust.
  • Contribute hands-on to system design, monitoring, and debugging across distributed systems and data pipelines.

Requirements

  • 5+ years of experience in SRE, Production Engineering, or a related role.
  • 3+ years of experience operating at a senior or technical leadership level, such as Staff scope or equivalent.
  • Deep expertise with AWS and/or GCP.
  • Experience with Kubernetes and Helm.
  • Experience with observability stacks such as Prometheus and Grafana, or equivalent tools.
  • Experience with CI/CD systems such as GitLab CI/CD and ArgoCD, or similar tools.
  • Proven experience designing and scaling reliability systems for multi-tenant SaaS platforms.
  • Strong debugging and systems thinking across distributed microservices and legacy systems.
  • Demonstrated ability to lead initiatives that improve incident detection, response, and system resilience.
  • Hands-on engineering approach with a track record of building reliability systems, not just configuring them.
  • Experience in B2B SaaS serving enterprise or financial customers, preferred.
  • Familiarity with third-party SaaS connector architectures and ingestion patterns, preferred.
  • Experience building anomaly detection or intelligent alerting systems, preferred.
  • Experience designing customer-facing status pages and incident communication frameworks, preferred.

Benefits

  • Competitive compensation with equity and 401(k).
  • Comprehensive healthcare with dental and vision coverage.
  • Flexible paid time off and paid holiday time off.
  • 12 weeks of new parent or family leave.
  • Personal and professional development resources.
  • Base salary range of $232,000 to $263,000 USD.
  • Eligibility for equity awards and possible sales commission or incentive compensation, depending on role or function.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Software Engineering

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is seeking a Senior Manager to lead CorpTech Platform software teams that build and operate AI-enabled production systems and improve how internal engineering work is designed, shipped, and maintained.

CI/CD Computer Vision ERP LLM Microservices
46 minutes ago

Staff Site Reliability Engineer

Puck 1-10 Internet Software & Services

Domino is hiring a senior Site Reliability Engineer to build AI-assisted reliability systems and strengthen the operational resilience of its cloud-based data science platform.

Go Kubernetes Linux LLM Python
1 hour, 8 minutes ago

DevOps Engineer / SRE

Fundraise Up 51-250 Capital Markets

Fundraise Up is hiring a DevOps Engineer/SRE to own on-premise infrastructure and keep its global fundraising platform stable, fast, and secure.

Ansible Bash CI/CD ClickHouse Elasticsearch Git GitOps HAProxy HashiCorp Vault Jenkins Kafka Koa Kubernetes Linux MongoDB NestJS Nginx Node.js Prometheus Python React Redis Terraform TypeScript Ubuntu Vue.js
2 hours, 2 minutes ago

Senior Database Reliability Engineer

Sezzle 251-1K Diversified Financial Services

Sezzle is hiring a Senior Database Reliability Engineer to design and scale the database platform behind its applications, with a focus on making database usage safer, more reliable, and easier for developers across the company.

AWS CI/CD Datadog Elasticsearch Encryption Git GitLab Go Grafana Helm Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python React React Native Secrets Management Terraform TypeScript
2 hours, 49 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers