Senior Site Reliability Engineer

1 hour, 18 minutes ago
Full-time
Senior
DevOps and Infrastructure
Honeycomb.io

Honeycomb.io

Honeycomb.io provides a comprehensive observability platform designed for engineers to effectively debug and monitor distributed services, including microservices and serverless applications, facilitating collaborative problem-solving and enhancing ove...

Internet Software & Services
51-250
Founded 2016
$149M raised

Description

  • Help scale backend systems to support Honeycomb’s highest-volume customers.
  • Work with backend teams to analyze and optimize infrastructure and the broader stack.
  • Build organizational trust through transparent communication and direct, kind feedback.
  • Train as an Incident Commander and help train others in the role.
  • Support and help develop a healthy cross-Atlantic engineering culture.
  • Participate in the EU side of the team’s follow-the-sun on-call rotation.
  • Help the organization balance reliability with other business goals and priorities.
  • Optionally represent Honeycomb externally through blog posts, conference talks, and presentations with DevRel support.

Requirements

  • Strong experience in AWS and Kubernetes.
  • Experience performing cost analysis and cost reduction.
  • Solid experience with Helm, Terraform, and CI/CD.
  • Project management skills.
  • Software engineering experience; Golang is a plus.
  • Performance engineering experience is a plus.
  • Experience with Kafka or another high-volume distributed system.
  • Excellent written and spoken communication skills, including tailoring communication to the audience and giving direct feedback.
  • Familiarity with observability concepts such as SLOs and instrumentation, plus data-driven decision making.
  • Comfort operating in ambiguity with a bias for action and experimentation.
  • Interest in both the technical and human sides of reliability engineering.
  • Experience working in geographically distributed teams.
  • Please note that Honeycomb cannot currently sponsor or support visa transfers.
  • All hires must verify identity and eligibility to work.

Benefits

  • Base salary of €140,590 to €165,400 EUR depending on experience.
  • Generous equity with an employee-friendly stock program.
  • Transparent pay levels based on experience.
  • Unlimited PTO.
  • Home office, co-working, and internet stipend.
  • Full benefits coverage for employees, with additional coverage available for dependents.
  • Up to 16 weeks of paid parental leave, regardless of path to parenthood.
  • Annual development allowance.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Site Reliability Engineer, Production Engineering

Dropbox 1K-5K Internet Software & Services

Dropbox is hiring a Site Reliability Engineer to shape company-wide reliability strategy for AI-assisted and agentic software development while improving stability, observability, incident response, and operational excellence at scale.

1 hour, 8 minutes ago

Sr. Site Reliability Engineer III (6448)

MetroStar 251-1K IT Services

MetroStar is hiring a Sr. Site Reliability Engineer III to support mission-critical federal government systems by ensuring reliable, secure, and scalable application operations across modern infrastructure environments.

Ansible AWS Bash CI/CD Kubernetes Load Balancing
1 hour, 18 minutes ago

Senior Production Engineer

Veeam Software 1K-5K Internet Software & Services

Veeam is hiring a Senior Production Engineer to design and operate reliable, scalable production systems for its Data Cloud platform and to lead improvements in incident response, automation, observability, and operational excellence.

Azure C# CI/CD Elasticsearch Go Grafana Java JavaScript OpenTelemetry Prometheus TypeScript
1 hour, 18 minutes ago

[Job - 29712] Senior Devops / SRE

CI&T 5K-10K Internet Software & Services

CI&T is hiring a Senior DevOps/SRE to support remote delivery of scalable .NET and Next.js products with a strong focus on CI/CD, infrastructure reliability, observability, and incident response.

AWS AWS CDK Azure C# CI/CD Datadog Docker Gatling GitHub Actions GitLab CI Grafana Jaeger K6 Kubernetes .NET Next.js OpenTelemetry Prometheus Pulumi Terraform TypeScript WAF
2 hours, 18 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers