Database Reliability Engineer - Core Team

2 hours, 17 minutes ago
Full-time
Senior
DevOps and Infrastructure
ClickHouse

ClickHouse

ClickHouse provides a fast open source column-oriented database management system that enables users to generate real-time analytical data reports through SQL queries, catering to the needs of industries requiring efficient data processing and analysis.

IT Services
51-250
Founded 2021
$300M raised

Description

  • Continuously improve the reliability and performance of ClickHouse Core.
  • Create and improve metrics and alerts to detect and prevent production issues before they affect customers.
  • Investigate recurring customer problems, identify root causes, and submit bug fixes, issue reports, and improvement suggestions.
  • Enhance incident response processes and conduct post-mortem analysis for ClickHouse Core outages, including customer communication with support and cloud teams.
  • Plan, enable, and drive chaos engineering initiatives across engineering teams based on internal priorities.
  • Manage on-call processes for performance and reliability issues and establish best practices for escalation and resolution.
  • Collaborate with Control Plane, Dataplane, Security, Support, and Operations teams to guide best practices for implementing ClickHouse for customers.
  • Own escalation management, response, investigations, blameless postmortems, and continuous improvement of cloud operations.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field.
  • At least 5 years of experience in Reliability Engineering, QA, or customer-facing engineering.
  • Experience operating ClickHouse or other SQL databases in production.
  • Strong understanding of distributed database internals and SQL, with ClickHouse experience being a major plus.
  • Scripting experience with Shell or Python.
  • Ability to read and understand C++ code.
  • Knowledge of cloud platforms such as AWS, Azure, or Google Cloud Platform.
  • Strong problem-solving and production debugging skills.
  • Excellent communication skills.
  • High level of responsibility, ownership, and accountability.
  • Experience working in a fast-paced global team environment and partnering closely with the business.

Benefits

  • Flexible, remote-friendly work environment with the option to work remotely in the Netherlands, UK, or Germany.
  • Employer contributions toward healthcare.
  • Equity in the company through stock options for new team members.
  • Flexible time off in the US and generous time off in other countries.
  • $500 home office setup budget for remote employees.
  • Opportunities to connect through company-wide global gatherings and offsites.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Incident Engineer

Netomi 51-250 IT Services

Netomi is hiring a remote Incident Engineer in Gurugram to manage end-to-end incident response for its enterprise AI customer experience platform and keep customer- and internal-facing systems running reliably.

AWS Datadog LLM
2 minutes ago

Sr. Site Reliability Engineer

Backblaze 251-1K IT Services

Backblaze is seeking a Senior Site Reliability Engineer to improve the stability, scalability, and reliability of its customer-facing cloud storage services and infrastructure.

Ansible AWS Azure Bash Docker ELK Stack GCP Go Grafana HashiCorp Vault Jenkins Kubernetes Linux Microservices Prometheus Python Terraform
2 hours, 17 minutes ago

Senior SRE - Data

Lightspeed 1K-5K Professional Services

Lightspeed is hiring a data infrastructure and platform engineer to support its data and AI ecosystem by building secure, reliable, highly available cloud infrastructure and governance foundations.

Ansible Bash CI/CD Docker GCP GitHub Actions Go Kubernetes Linux MySQL PostgreSQL Puppet Terraform Unix
2 hours, 32 minutes ago

Sr. Site Reliability Engineer I

Axon 1K-5K Professional Services

Axon is hiring a Senior Site Reliability Engineer in Canada to strengthen cloud-native identity and security systems that support mission-critical services and reliable product delivery.

AWS Azure C# CI/CD Go Java Kubernetes OpenID Connect Python SAML Secrets Management
2 hours, 32 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers