Database Reliability Engineer - Core Team

12 hours, 16 minutes ago
Full-time
Senior
DevOps and Infrastructure
ClickHouse

ClickHouse

ClickHouse provides a fast open source column-oriented database management system that enables users to generate real-time analytical data reports through SQL queries, catering to the needs of industries requiring efficient data processing and analysis.

IT Services
51-250
Founded 2021
$300M raised

Description

  • Build and lead processes that improve the reliability, availability, scalability, and performance of ClickHouse Core.
  • Collaborate with Control Plane, Dataplane, Security, Support, and Operations teams to implement ClickHouse effectively for customers.
  • Own engineering escalation management, incident response, investigations, and post-mortem analysis for core-related issues.
  • Run blameless postmortems and drive continuous improvement in how ClickHouse is run and optimized in the cloud.
  • Continuously improve the reliability and performance of ClickHouse Core.
  • Create and refine metrics and alerts to detect and prevent production issues before they affect customers.
  • Investigate common customer problems, identify root causes, and submit bug fixes, issue reports, and improvement suggestions.
  • Enhance incident response processes and communicate with impacted customers during outages.
  • Plan, enable, and drive chaos initiatives across engineering teams.
  • Manage on-call processes and establish best practices for escalation and issue resolution.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field.
  • At least 5 years of experience in Reliability Engineering, QA, or customer-facing engineering.
  • Previous experience operating ClickHouse or other SQL databases in production.
  • Strong understanding of distributed database internals and SQL; ClickHouse experience is a major plus.
  • Scripting experience with Shell or Python.
  • Ability to read and understand C++ code.
  • Knowledge of cloud computing platforms such as AWS, Azure, or Google Cloud Platform.
  • Solid production debugging and problem-solving skills.
  • Excellent communication skills.
  • Experience working in a fast-paced global team with strong ownership and accountability.

Benefits

  • Remote-friendly flexible work environment; the role can be based remotely in the United Kingdom, Germany, or the Netherlands.
  • Employer contributions toward healthcare.
  • Stock options for every new team member.
  • Flexible time off in the US and generous time off entitlement in other countries.
  • $500 home office setup allowance for remote employees.
  • Opportunities to attend company-wide Global Gatherings and offsites.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Database Reliability Engineer

Sporty Group 51-250 Media

Sporty is seeking a Database Reliability Engineer to own and improve its database infrastructure supporting multiple platforms and international expansion.

Ansible Argo CD Elasticsearch GitHub Actions Go Grafana Helm Jenkins Kubernetes MongoDB MySQL PostgreSQL Prometheus Python RabbitMQ Terraform
10 hours, 16 minutes ago

Senior Site Reliability Engineer

Moniepoint 1K-5K Diversified Financial Services

Moniepoint is hiring an experienced Site Reliability Engineer to improve the reliability, scalability, and observability of its highly distributed financial platform serving emerging markets.

AWS Azure Datadog GCP Go Java Kafka Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python RabbitMQ Rust
11 hours, 1 minute ago

Senior Site Reliability Engineer, Identity Platform

Coinbase 1K-5K Capital Markets

Coinbase is hiring an experienced Site Reliability Engineer to build and scale identity and access management tooling for its IT Operations Corporate Engineering team supporting cloud-based, security-first systems.

Ansible AWS Azure C# CI/CD Docker GCP Go Java Kubernetes Python Ruby Secrets Management Terraform
11 hours, 31 minutes ago

Site Reliability Engineer II

Backblaze 251-1K IT Services

Backblaze is hiring a Site Reliability Engineer II to help ensure the stability, scalability, and reliability of its cloud storage services and infrastructure.

Ansible AWS Azure Bash CI/CD Docker GCP Go Grafana Jenkins Kubernetes Linux Microservices Prometheus Python Terraform
12 hours, 16 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers