Alphasense

Alphasense

Alphasense is a global leader in providing high-quality gas sensors and air quality monitors to industrial OEMs. With over 25 years of experience, the company offers a wide range of innovative gas sensor technologies for various applications, including...

Industrial Conglomerates
51-250
Founded 1996

Description

  • Architect reliability frameworks and self-service tooling that enable teams to own the reliability of their services.
  • Lead the company’s AIOps strategy by automating diagnostics, remediation, and proactive failure prevention.
  • Embed SRE practices across engineering through design reviews, production readiness work, and operational standards.
  • Serve as Incident Commander during critical incidents and ensure blameless postmortems drive lasting improvements.
  • Build end-to-end observability through monitoring, tracing, and profiling to proactively improve performance.
  • Mentor engineers across SRE and product teams through technical guidance and knowledge sharing.
  • Influence architectural decisions and set the technical standard for reliability across the organization.

Requirements

  • 8+ years of experience in Site Reliability Engineering, DevOps, or a similar role.
  • At least 3+ years of experience in a Senior+ SRE position.
  • Experience running production SaaS systems at scale.
  • Proficiency in at least one programming or scripting language such as Python or Go.
  • Hands-on experience with cloud platforms such as AWS, GCP, or Azure.
  • Hands-on experience with Kubernetes.
  • Strong understanding of networking fundamentals, including TCP/IP, DNS, HTTP/S, and load balancing.
  • Experience with monitoring and alerting tools such as Prometheus, Grafana, Datadog, or ELK.
  • Familiarity with advanced observability tools and practices such as OTEL and continuous profiling.
  • Proven incident management experience, including leading high-severity incidents and postmortems.
  • Strong troubleshooting skills across the full stack.
  • Excellent communication and collaboration skills.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Database Reliability Engineer

Rithum Internet Software & Services

Rithum is hiring a Senior Database Reliability Engineer to ensure the reliability, availability, security, and observability of its large-scale, hybrid database environment supporting global e-commerce operations.

AWS CI/CD DynamoDB Elasticsearch MongoDB MySQL PostgreSQL PowerShell Python Redis SQL Server
29 minutes ago

Senior Database Reliability Engineer

Rithum Internet Software & Services

Rithum is seeking a Senior Database Reliability Engineer to manage and improve the availability, reliability, observability, and security of its large-scale hybrid database environment supporting e-commerce operations.

AWS CI/CD DynamoDB Elasticsearch MongoDB MySQL PostgreSQL PowerShell Python Redis SQL Server
59 minutes ago

Senior Site Reliability Engineer

Headout 251-1K Consumer Services

Headout is hiring a Senior Site Reliability Engineer to own cloud infrastructure, reliability, observability, and platform improvements that support the company’s fast-growing travel marketplace.

AWS Azure CI/CD Datadog GCP GitHub GitHub Actions GitLab Go Grafana Jaeger Java Jenkins Kotlin Kubernetes Microservices MongoDB MySQL New Relic Prometheus Pulumi Python Shell Scripting Terraform
1 hour, 29 minutes ago

Site Reliability Engineer

Mistral AI 201-500 Artificial Intelligence

Mistral AI is hiring a Site Reliability Engineer in Europe to improve the reliability, scalability, and performance of its platform and customer-facing applications across cloud and HPC environments.

Bash CI/CD CloudFormation Datadog Docker ELK Stack Flux Go Grafana Kubernetes Microservices Prometheus Python REST API Terraform
1 hour, 34 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers