Headout

Headout

Headout is an on-demand mobile marketplace that offers travelers access to the best tours, attractions, events, and local experiences at discounted prices. With a focus on curated experiences, Headout provides a one-stop solution for discovering and bo...

Consumer Services
251-1K
Founded 2015
$66M raised

Description

  • Manage and optimize Kubernetes clusters and their workloads across cloud infrastructure.
  • Build and maintain CI/CD pipelines and reusable workflows, including canary release processes.
  • Design service-level dashboards, fine-tune alerts, and manage incidents across the organization.
  • Improve application performance by rolling out backend changes that boost API and page performance, database efficiency, and bottleneck resolution.
  • Architect and build scalable platform tools for cross-pod use cases.
  • Develop tools and workflows that improve developer velocity and engineering efficiency.
  • Build guardrails for security practices and help standardize them across the organization.
  • Collaborate with and mentor junior engineers, drive root-cause analyses, and promote best practices.
  • Work across DevOps, observability, application performance, and related platform areas.

Requirements

  • 4-7 years of experience operating customer-facing services at scale.
  • Proficiency in operating, debugging, and optimizing Kubernetes clusters and workloads.
  • Experience with service mesh and tracing tools such as Istio and Jaeger.
  • Comfort working with any cloud provider, preferably AWS.
  • Hands-on experience with monitoring and alerting stacks such as Prometheus, Grafana, Thanos, New Relic, or Datadog.
  • Experience designing robust CI/CD workflows in tools such as GitHub, GitLab, or Jenkins.
  • Proficiency with infrastructure as code using Terraform or Pulumi.
  • Fluency in Python, Go, or Java/Kotlin, plus shell scripting.
  • Experience working with databases such as MySQL or MongoDB.
  • Ability to profile applications, database queries, and traces.
  • Understanding of security best practices and compliance requirements.
  • High agency and a proactive approach to identifying and fixing issues.
  • Interest in travel, local experiences, and hospitality is a bonus.
  • Experience in a rapidly growing startup is a bonus.
  • Anything out of the box that can surprise the team is a bonus.

Benefits

  • Work at a profitable, fast-growing company with $130M in revenue and guests in 100+ cities.
  • Opportunity to influence architecture decisions and the evolution of the stack.
  • High-impact work that improves deployment turnaround time and p99 performance metrics.
  • Flexibility to work across different stacks, tools, and platforms.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
4 hours, 38 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
4 hours, 53 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
5 hours, 8 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
5 hours, 23 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers