ClickHouse

ClickHouse

ClickHouse provides a fast open source column-oriented database management system that enables users to generate real-time analytical data reports through SQL queries, catering to the needs of industries requiring efficient data processing and analysis.

IT Services
51-250
Founded 2021
$300M raised

Description

  • Lead reliability and operations for ClickHouse’s Postgres integration, including upgrades, patching, maintenance, and scaling.
  • Design and implement automation for provisioning, deployments, and service lifecycle management across AWS, GCP, and Azure.
  • Develop infrastructure-as-code using Terraform and modern CI/CD tooling to support consistent deployments.
  • Build and maintain Go-based tooling and services that improve automation, observability, and developer experience.
  • Own observability and monitoring across environments, including alerting, metrics, and tracing.
  • Drive incident management and postmortem practices to improve reliability and continuous learning.
  • Collaborate with platform, networking, and product teams to improve service operability.
  • Mentor and enable engineers as the platform and customer base grow.
  • Influence architecture and operational practices for ClickHouse’s cloud database platform.

Requirements

  • 7+ years of experience in SRE, DevOps, or infrastructure engineering.
  • Experience running distributed, production-grade systems.
  • Solid understanding of Postgres operations, scaling, and performance tuning.
  • Deep hands-on experience with AWS and exposure to GCP and Azure.
  • Experience navigating multi-cloud topologies.
  • Proficiency with Terraform, Kubernetes, and container-based infrastructure.
  • Strong Go development skills, or willingness to write and own production Go code.
  • Familiarity with observability tools such as Prometheus, Grafana, Loki, and OpenTelemetry, or equivalents.
  • Strong understanding of SLOs, incident response, and continuous improvement in service reliability.
  • Founder’s mentality with a hands-on, resourceful approach and willingness to dive deep to get things done.

Benefits

  • Typical US starting salary of $140,000 to $208,000 USD.
  • Typical US premium market starting salary of $155,000 to $230,000 USD.
  • Remote-friendly, globally distributed work environment with operations in over 20 countries.
  • Employer contributions toward healthcare.
  • Equity in the company through stock options for new team members.
  • Flexible time off in the US and generous time off in other countries.
  • $500 home office setup allowance for remote employees.
  • Opportunities to connect through company-wide global gatherings and offsites.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

[Job-28557] Senior SRE, Brazil

CI&T 5K-10K Internet Software & Services

CI&T is hiring a Senior SRE in Brazil to support a cloud-based application project with a strong focus on reliability, observability, and proactive operational ownership.

Android AWS Datadog Docker GitHub GitHub Actions Go Google Analytics Grafana iOS Java Jenkins Kubernetes Linux Prometheus Python Splunk Terraform
5 hours, 45 minutes ago

Director of Cloud Operations

Firstup 251-1K Professional Services

Firstup is hiring a Director of Cloud Operations to lead the reliability, scalability, and efficiency of its globally distributed SaaS cloud platform across AWS, while partnering with engineering, security, and product teams.

AWS CI/CD CircleCI Datadog Kubernetes Microservices .NET Serverless Terraform
6 hours, 15 minutes ago

Site Reliability Engineer (SRE)

hatch I.T. 11-50 Professional Services

CardioOne is hiring a remote Site Reliability Engineer to partner with engineering teams in keeping its healthcare platform reliable, scalable, secure, and high-performing as the company grows.

Ansible AWS Azure Chef CI/CD Datadog Docker Java Kubernetes Linux Microservices OpenTelemetry PostgreSQL Puppet Python Shell Scripting Terraform
6 hours, 30 minutes ago

Staff Site Reliability Engineer

Caseware 251-1K Internet Software & Services

Caseware is hiring a Staff Site Reliability Engineer in Romania to help build and scale its AI platform by keeping AWS, Kubernetes, and GitOps-based production systems reliable, observable, and automated.

AWS AWS CDK CI/CD Docker GitHub GitHub Actions GitOps Kubernetes Linux Load Balancing Microservices Terraform
6 hours, 45 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers