Headout

Headout

Headout is an on-demand mobile marketplace that offers travelers access to the best tours, attractions, events, and local experiences at discounted prices. With a focus on curated experiences, Headout provides a one-stop solution for discovering and bo...

Consumer Services
251-1K
Founded 2015
$66M raised

Description

  • Manage and optimize Kubernetes clusters and their workloads across cloud infrastructure.
  • Build and maintain CI/CD pipelines and reusable workflows, including canary release processes.
  • Design service-level dashboards, fine-tune alerts, and manage incidents across the organization.
  • Improve application performance by rolling out backend changes that boost API and page performance, database efficiency, and bottleneck resolution.
  • Architect and build scalable platform tools for cross-pod use cases.
  • Develop tools and workflows that improve developer velocity and engineering efficiency.
  • Build guardrails for security practices and help standardize them across the organization.
  • Collaborate with and mentor junior engineers, drive root-cause analyses, and promote best practices.
  • Work across DevOps, observability, application performance, and related platform areas.

Requirements

  • 4-7 years of experience operating customer-facing services at scale.
  • Proficiency in operating, debugging, and optimizing Kubernetes clusters and workloads.
  • Experience with service mesh and tracing tools such as Istio and Jaeger.
  • Comfort working with any cloud provider, preferably AWS.
  • Hands-on experience with monitoring and alerting stacks such as Prometheus, Grafana, Thanos, New Relic, or Datadog.
  • Experience designing robust CI/CD workflows in tools such as GitHub, GitLab, or Jenkins.
  • Proficiency with infrastructure as code using Terraform or Pulumi.
  • Fluency in Python, Go, or Java/Kotlin, plus shell scripting.
  • Experience working with databases such as MySQL or MongoDB.
  • Ability to profile applications, database queries, and traces.
  • Understanding of security best practices and compliance requirements.
  • High agency and a proactive approach to identifying and fixing issues.
  • Interest in travel, local experiences, and hospitality is a bonus.
  • Experience in a rapidly growing startup is a bonus.
  • Anything out of the box that can surprise the team is a bonus.

Benefits

  • Work at a profitable, fast-growing company with $130M in revenue and guests in 100+ cities.
  • Opportunity to influence architecture decisions and the evolution of the stack.
  • High-impact work that improves deployment turnaround time and p99 performance metrics.
  • Flexibility to work across different stacks, tools, and platforms.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Database Reliability Engineer

Rithum Internet Software & Services

Rithum is hiring a Senior Database Reliability Engineer to ensure the reliability, availability, security, and observability of its large-scale, hybrid database environment supporting global e-commerce operations.

AWS CI/CD DynamoDB Elasticsearch MongoDB MySQL PostgreSQL PowerShell Python Redis SQL Server
28 minutes ago

Senior Database Reliability Engineer

Rithum Internet Software & Services

Rithum is seeking a Senior Database Reliability Engineer to manage and improve the availability, reliability, observability, and security of its large-scale hybrid database environment supporting e-commerce operations.

AWS CI/CD DynamoDB Elasticsearch MongoDB MySQL PostgreSQL PowerShell Python Redis SQL Server
58 minutes ago

Staff Site Reliability Engineer

Alphasense 51-250 Industrial Conglomerates

AlphaSense is seeking a Staff Site Reliability Engineer to architect and advance reliability, scalability, and performance for its global AI-driven market intelligence platform.

AWS Azure Datadog DNS GCP Go Grafana Kubernetes Load Balancing Microservices OpenTelemetry Prometheus Python TCP/IP
1 hour, 28 minutes ago

Site Reliability Engineer

Mistral AI 201-500 Artificial Intelligence

Mistral AI is hiring a Site Reliability Engineer in Europe to improve the reliability, scalability, and performance of its platform and customer-facing applications across cloud and HPC environments.

Bash CI/CD CloudFormation Datadog Docker ELK Stack Flux Go Grafana Kubernetes Microservices Prometheus Python REST API Terraform
1 hour, 32 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers