Database Reliability Engineer - Core Team

9 hours, 45 minutes ago
Full-time
Senior
DevOps and Infrastructure
ClickHouse

ClickHouse

ClickHouse provides a fast open source column-oriented database management system that enables users to generate real-time analytical data reports through SQL queries, catering to the needs of industries requiring efficient data processing and analysis.

IT Services
51-250
Founded 2021
$300M raised

Description

  • Continuously improve the reliability and performance of ClickHouse Core.
  • Create and improve metrics and alerts to detect and prevent production issues before they affect customers.
  • Investigate common customer-facing problems in ClickHouse Core to identify root causes and propose bug fixes and improvements.
  • Enhance incident response processes and lead post-mortem analysis for ClickHouse Core outages.
  • Work with support and cloud teams to communicate with impacted customers during incidents.
  • Plan, enable, and drive chaos engineering initiatives across engineering teams.
  • Manage on-call processes for performance and reliability issues and establish escalation best practices.
  • Collaborate with Control Plane, Dataplane, Security, Support, and Operations teams to implement ClickHouse effectively for customers.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field.
  • At least 5 years of experience in Reliability Engineering, QA, or customer-facing engineering.
  • Experience operating ClickHouse or other SQL databases in production.
  • Strong understanding of distributed database internals and SQL; ClickHouse knowledge is a major plus.
  • Scripting experience with Shell or Python.
  • Ability to read and understand C++ code.
  • Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
  • Strong problem-solving and production debugging skills.
  • Excellent communication skills.
  • High level of responsibility, ownership, and accountability.
  • Experience working in a fast-paced global team environment is preferred.

Benefits

  • Remote-friendly flexible work environment with global hiring presence in 20 countries.
  • Employer contributions toward healthcare.
  • Equity in the company through stock options for new team members.
  • Flexible time off in the US and generous time off in other countries.
  • $500 home office setup support for remote employees.
  • Opportunities to connect with colleagues through company-wide global gatherings.
  • Salary range may include premium market adjustments in locations such as the San Francisco Bay Area and New York City Metro Area.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Database Reliability Engineer

Sporty Group 51-250 Media

Sporty is seeking a Database Reliability Engineer to own and improve its database infrastructure supporting multiple platforms and international expansion.

Ansible Argo CD Elasticsearch GitHub Actions Go Grafana Helm Jenkins Kubernetes MongoDB MySQL PostgreSQL Prometheus Python RabbitMQ Terraform
8 hours ago

Senior Site Reliability Engineer

Moniepoint 1K-5K Diversified Financial Services

Moniepoint is hiring an experienced Site Reliability Engineer to improve the reliability, scalability, and observability of its highly distributed financial platform serving emerging markets.

AWS Azure Datadog GCP Go Java Kafka Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python RabbitMQ Rust
8 hours, 45 minutes ago

Senior Site Reliability Engineer, Identity Platform

Coinbase 1K-5K Capital Markets

Coinbase is hiring an experienced Site Reliability Engineer to build and scale identity and access management tooling for its IT Operations Corporate Engineering team supporting cloud-based, security-first systems.

Ansible AWS Azure C# CI/CD Docker GCP Go Java Kubernetes Python Ruby Secrets Management Terraform
9 hours, 15 minutes ago

Site Reliability Engineer II

Backblaze 251-1K IT Services

Backblaze is hiring a Site Reliability Engineer II to help ensure the stability, scalability, and reliability of its cloud storage services and infrastructure.

Ansible AWS Azure Bash CI/CD Docker GCP Go Grafana Jenkins Kubernetes Linux Microservices Prometheus Python Terraform
10 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers