Alphasense

Alphasense

Alphasense is a global leader in providing high-quality gas sensors and air quality monitors to industrial OEMs. With over 25 years of experience, the company offers a wide range of innovative gas sensor technologies for various applications, including...

Industrial Conglomerates
51-250
Founded 1996

Description

  • Architect reliability frameworks and self-service tooling that enable teams to own the reliability of their services.
  • Lead the company’s AIOps strategy by automating diagnostics, remediation, and proactive failure prevention.
  • Embed SRE practices across engineering through design reviews, production readiness reviews, and operational standards.
  • Serve as Incident Commander during critical incidents and drive blameless postmortems that produce lasting improvements.
  • Build end-to-end monitoring, tracing, and profiling capabilities to improve performance and reliability proactively.
  • Mentor engineers across SRE and product teams through technical guidance and knowledge sharing.
  • Influence architectural decisions and set the technical bar for reliability across the organization.
  • Lead by example in incident response and operational excellence across the global engineering team.

Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
  • At least 3+ years of experience operating in a Senior+ SRE position.
  • Strong background in running production SaaS systems at scale.
  • Proficiency in at least one programming or scripting language such as Python or Go.
  • Hands-on experience with cloud platforms such as AWS, GCP, or Azure and Kubernetes.
  • Deep understanding of networking fundamentals, including TCP/IP, DNS, HTTP/S, and load balancing.
  • Experience with monitoring and alerting tools such as Prometheus, Grafana, Datadog, or ELK.
  • Familiarity with advanced observability tools and practices such as OTEL and continuous profiling.
  • Proven incident management experience, including leading high-severity incidents and postmortems.
  • Strong troubleshooting skills across the full stack.
  • Excellent communication and collaboration skills.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer (Application Software)

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Site Reliability Engineer for its application software team to build and operate mission-critical platforms that speed vehicle software delivery, testing, and operations across Falcon 9, Starship, Dragon, and Starlink.

Ansible C# C++ ClickHouse Docker JavaScript Kubernetes Linux MySQL PostgreSQL Puppet Python Terraform
1 hour, 19 minutes ago

Principal Software Engineer II - Observability

Elastic 1K-5K Internet Software & Services

Elastic is seeking a Principal Software Engineer to serve as a Tech Lead on the Observability Experience Team, shaping end-to-end experiences for logs, metrics, and traces across the company’s cloud-based Search AI platform.

1 hour, 19 minutes ago

Site Reliability Engineer, Infrastructure Shared Services

Pure Storage 1K-5K IT Services

Everpure is hiring a Reliability Engineer to own infrastructure, internal tooling, and production services across data center and cloud environments, with the goal of improving the resiliency, observability, and incident management of critical systems.

Ansible Argo CD AWS CloudFormation Docker Go Kubernetes Python Terraform
1 hour, 19 minutes ago

Member of Technical Staff, Fleet Reliability

Pure Storage 1K-5K IT Services

Pure Storage is hiring a Forensics Software Engineer to own fleet reliability and build investigative and predictive solutions that help diagnose customer issues and protect globally distributed systems.

C++ Go Java Linux Python
1 hour, 49 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers