Staff Site Reliability Engineer

1 hour, 37 minutes ago
Full-time
Lead
DevOps and Infrastructure
Obsidian Security

Obsidian Security

Obsidian Security is a Southern California-based company at the forefront of cybersecurity, artificial intelligence, and hybrid cloud environments. They offer a comprehensive security solution for businesses, including advanced threat protection, insid...

Internet Software & Services
51-250
Founded 2017
$30M raised

Description

  • Define and lead the long-term reliability strategy across services.
  • Establish end-to-end system visibility frameworks for observability, detection, and resilience.
  • Partner across DevOps, Platform Engineering, and other teams to standardize SLI/SLOs and embed reliability practices.
  • Serve as a technical escalation expert for reliability-related issues.
  • Build intelligent detection systems, including anomaly detection and connector health models.
  • Enable self-service observability for engineering teams.
  • Define, evolve, and execute a tiered incident communication strategy.
  • Improve incident response practices and lead postmortems to strengthen reliability and customer trust.
  • Contribute hands-on to system design, monitoring, and debugging across distributed systems and data pipelines.

Requirements

  • 5+ years of experience in SRE, Production Engineering, or a related role.
  • 3+ years operating at a senior or technical leadership level with Staff-level or equivalent scope.
  • Deep expertise with AWS and/or GCP.
  • Experience with Kubernetes and Helm.
  • Experience with observability tools such as Prometheus and Grafana, or equivalent stacks.
  • Experience with CI/CD systems such as GitLab CI/CD and ArgoCD, or similar tools.
  • Proven experience designing and scaling reliability systems for multi-tenant SaaS platforms.
  • Strong debugging and systems thinking across distributed microservices and legacy systems.
  • Demonstrated ability to lead initiatives that improve incident detection, response, and system resilience.
  • Hands-on engineering approach with a track record of building reliability systems, not just configuring them.
  • Preferred: experience in B2B SaaS serving enterprise or financial customers.
  • Preferred: familiarity with third-party SaaS connector architectures and ingestion patterns.
  • Preferred: experience building anomaly detection or intelligent alerting systems.
  • Preferred: experience designing customer-facing status pages and incident communication frameworks.

Benefits

  • Competitive compensation with equity and 401(k).
  • Comprehensive healthcare with dental and vision coverage.
  • Flexible paid time off plus paid holiday time off.
  • 12 weeks of new parent or family leave.
  • Personal and professional development resources.
  • Base salary range of £124,000 to £141,000 GBP.
  • Potential eligibility for additional equity awards and incentive compensation, depending on role and function.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Database Reliability Engineer

Sezzle 251-1K Diversified Financial Services

Sezzle is hiring a Senior Database Reliability Engineer to design and scale the database platform behind its applications, with a focus on making database usage safer, more reliable, and easier for developers across the company.

AWS CI/CD Datadog Elasticsearch Encryption Git GitLab Go Grafana Helm Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python React React Native Secrets Management Terraform TypeScript
1 hour, 53 minutes ago

Site Reliability Engineer (SRE)

Valstro 11-50 Internet Software & Services

Valstro is seeking a remote Site Reliability Engineer to support its cloud-native trading platform by improving reliability, availability, performance, and deployment operations across production and UAT systems.

AWS Azure Bash Datadog Docker GCP Go Grafana Kubernetes Prometheus Python Terraform
3 hours, 15 minutes ago

Senior Database Reliability Engineer

Sezzle 251-1K Diversified Financial Services

Sezzle is hiring a Senior Database Reliability Engineer to design, build, and scale the database platform that supports its applications and helps teams use databases more reliably, securely, and efficiently.

AWS CI/CD Datadog Elasticsearch Encryption Git Go Grafana GraphQL Helm Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python React React Native REST API Secrets Management Terraform TypeScript
4 hours, 46 minutes ago

Senior Site Reliability Engineer

OfficeSpace Software 251-1K Internet Software & Services

OfficeSpace Software is hiring a Senior Site Reliability Engineer to own the performance, reliability, and cost efficiency of its production platform at scale while helping modernize operations with AI-assisted reliability engineering.

Ansible Apache Argo CD CI/CD Datadog GitOps Grafana Kubernetes Linux MariaDB Microservices MySQL Nginx PostgreSQL Prometheus Puppet Python Redis Ruby Ruby on Rails Sidekiq Terraform
5 hours, 56 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers