Obsidian Security

Obsidian Security

Obsidian Security is a Southern California-based company at the forefront of cybersecurity, artificial intelligence, and hybrid cloud environments. They offer a comprehensive security solution for businesses, including advanced threat protection, insid...

Internet Software & Services
51-250
Founded 2017
$30M raised

Description

  • Improve the reliability, availability, and resiliency of production systems and distributed services.
  • Build and maintain monitoring, alerting, dashboards, and observability tooling.
  • Support incident response, on-call operations, troubleshooting, and postmortem processes.
  • Partner with engineering teams to implement SLI/SLO practices and reliability-focused workflows.
  • Automate infrastructure operations, deployment workflows, and platform tooling across Kubernetes, cloud infrastructure, and data pipelines.
  • Collaborate with DevOps, Platform Engineering, and product teams to improve observability, incident response, and service resilience.
  • Help ensure production issues are detected and addressed quickly.
  • Contribute to operational standards and continuous improvement across the platform.

Requirements

  • 3-6 years of experience in Site Reliability Engineering, DevOps, Production Engineering, or related roles.
  • Experience operating and supporting production systems in AWS and/or GCP.
  • Familiarity with Kubernetes and Helm in cloud-native environments.
  • Experience with observability and monitoring tools such as Prometheus, Grafana, Datadog, or similar platforms.
  • Exposure to CI/CD systems such as GitLab CI/CD, GitHub Actions, ArgoCD, or equivalent.
  • Strong troubleshooting and debugging skills across distributed systems and microservices.
  • Experience writing automation or infrastructure tooling using scripting or programming languages.
  • Strong systems thinking and a collaborative engineering mindset.
  • AI Agent development experience is preferred.
  • Experience supporting SaaS platforms in production environments is preferred.
  • Familiarity with incident management and postmortem practices is preferred.
  • Exposure to infrastructure-as-code and GitOps workflows is preferred.
  • Understanding of SLI/SLO concepts and operational metrics is preferred.
  • Experience with enterprise-scale monitoring or customer-facing production systems is preferred.

Benefits

  • Competitive compensation with equity and 401k.
  • Base salary range of £95,000-£117,000 GBP.
  • Comprehensive healthcare with dental and vision coverage.
  • Flexible paid time off plus paid holiday time off.
  • 12 weeks of new parent or family leave.
  • Personal and professional development resources.
  • Eligible for equity awards, with possible sales commission or incentive compensation depending on role or function.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
16 hours, 38 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 15 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 16 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 16 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers