Headout

Headout

Headout is an on-demand mobile marketplace that offers travelers access to the best tours, attractions, events, and local experiences at discounted prices. With a focus on curated experiences, Headout provides a one-stop solution for discovering and bo...

Consumer Services
251-1K
Founded 2015
$66M raised

Description

  • Manage and optimize Kubernetes clusters and their workloads across cloud infrastructure.
  • Build and maintain CI/CD pipelines and reusable workflows, including canary release processes.
  • Design service-level dashboards, fine-tune alerts, and manage incidents across the organization.
  • Improve application performance by rolling out backend changes that boost API and page performance, database efficiency, and bottleneck resolution.
  • Architect and build scalable platform tools for cross-pod use cases.
  • Develop tools and workflows that improve developer velocity and engineering efficiency.
  • Build guardrails for security practices and help standardize them across the organization.
  • Collaborate with and mentor junior engineers, drive root-cause analyses, and promote best practices.
  • Work across DevOps, observability, application performance, and related platform areas.

Requirements

  • 4-7 years of experience operating customer-facing services at scale.
  • Proficiency in operating, debugging, and optimizing Kubernetes clusters and workloads.
  • Experience with service mesh and tracing tools such as Istio and Jaeger.
  • Comfort working with any cloud provider, preferably AWS.
  • Hands-on experience with monitoring and alerting stacks such as Prometheus, Grafana, Thanos, New Relic, or Datadog.
  • Experience designing robust CI/CD workflows in tools such as GitHub, GitLab, or Jenkins.
  • Proficiency with infrastructure as code using Terraform or Pulumi.
  • Fluency in Python, Go, or Java/Kotlin, plus shell scripting.
  • Experience working with databases such as MySQL or MongoDB.
  • Ability to profile applications, database queries, and traces.
  • Understanding of security best practices and compliance requirements.
  • High agency and a proactive approach to identifying and fixing issues.
  • Interest in travel, local experiences, and hospitality is a bonus.
  • Experience in a rapidly growing startup is a bonus.
  • Anything out of the box that can surprise the team is a bonus.

Benefits

  • Work at a profitable, fast-growing company with $130M in revenue and guests in 100+ cities.
  • Opportunity to influence architecture decisions and the evolution of the stack.
  • High-impact work that improves deployment turnaround time and p99 performance metrics.
  • Flexibility to work across different stacks, tools, and platforms.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its brokerage platform reliable and operable across cloud, Kubernetes, observability, messaging, and database systems, with a strong focus on PostgreSQL reliability on the trading-critical path.

DNS GitOps Go Kafka Kubernetes Linux Load Balancing PostgreSQL Python RabbitMQ Secrets Management TLS
1 hour, 29 minutes ago

Site Reliability Engineer

Kaseya 1K-5K IT Services

Kaseya is hiring a Site Reliability Engineer to own the reliability, automation, and production stability of its AWS-based services used by thousands of MSPs worldwide.

Ansible AWS Chef CloudFormation Datadog DevSecOps Elasticsearch Kibana Kubernetes MySQL PostgreSQL Puppet Secrets Management Serverless Terraform
5 hours, 28 minutes ago

SRE - DevOps Engineer - Argentina

Coderio 51-250 Internet Software & Services

Coderio is hiring a remote DevOps/SRE Engineer in Argentina to ensure the stability, scalability, and efficient operation of the infrastructure that supports its global digital solutions.

Argo CD CI/CD Flux GitHub Actions GitOps Helm Jenkins Kubernetes OpenShift Terraform
9 hours, 8 minutes ago

Senior Site Reliability Engineer

Cribl 251-1K IT Services

Cribl is hiring a Senior Site Reliability Engineer in Poland to help build and operate the telemetry infrastructure and observability platform that supports its cloud products and enterprise customers.

Ansible AWS Azure CI/CD Grafana JavaScript Kibana Linux New Relic Node.js PagerDuty Prometheus Splunk Terraform TypeScript
16 hours, 41 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers