Senior Site Reliability Engineer- Remote

15 hours, 1 minute ago
Full-time
Senior
DevOps and Infrastructure
ClickHouse

ClickHouse

ClickHouse provides a fast open source column-oriented database management system that enables users to generate real-time analytical data reports through SQL queries, catering to the needs of industries requiring efficient data processing and analysis.

IT Services
51-250
Founded 2021
$300M raised

Description

  • Collaborate with engineering teams to design and implement scalable, secure, highly available systems.
  • Build and lead processes that improve reliability, availability, scalability, and performance across ClickHouse Cloud.
  • Establish and manage service level objectives (SLOs) and service level agreements (SLAs).
  • Implement monitoring and alerting across infrastructure components to detect and resolve incidents quickly.
  • Own incident management, incident response, and blameless post-mortem analysis for outages.
  • Work with Support to communicate incident impacts and updates to affected customers.
  • Continuously improve the reliability and performance of ClickHouse services.
  • Plan, enable, and drive chaos engineering initiatives across engineering teams.
  • Manage on-call processes and escalation practices to minimize downtime.
  • Develop software platforms and tools that improve operational and engineering efficiency.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field.
  • At least 8 years of experience in Site Reliability Engineering or a related field.
  • Hands-on experience with Go and/or Python.
  • Strong knowledge of cloud platforms such as AWS, Azure, or Google Cloud Platform.
  • Excellent understanding of distributed databases and SQL; experience with ClickHouse is a major plus.
  • Hands-on experience with container orchestration tools such as Kubernetes or Docker Swarm.
  • Strong experience with automation and configuration management tools such as Ansible, Terraform, or Puppet.
  • Strong problem-solving and production debugging skills.
  • High level of responsibility, ownership, and accountability.
  • Excellent communication and interpersonal skills.

Benefits

  • Remote-friendly flexible work environment across 20 countries.
  • Employer contributions toward healthcare.
  • Stock options for every new team member.
  • Flexible time off in the US and generous time off in other countries.
  • $500 home office setup budget for remote employees.
  • Opportunities to join company-wide global gatherings and offsites.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Database Reliability Engineer

Sporty Group 51-250 Media

Sporty is seeking a Database Reliability Engineer to own and improve its database infrastructure supporting multiple platforms and international expansion.

Ansible Argo CD Elasticsearch GitHub Actions Go Grafana Helm Jenkins Kubernetes MongoDB MySQL PostgreSQL Prometheus Python RabbitMQ Terraform
10 hours, 16 minutes ago

Senior Site Reliability Engineer

Moniepoint 1K-5K Diversified Financial Services

Moniepoint is hiring an experienced Site Reliability Engineer to improve the reliability, scalability, and observability of its highly distributed financial platform serving emerging markets.

AWS Azure Datadog GCP Go Java Kafka Kubernetes Microservices MySQL New Relic OpenTelemetry PostgreSQL Prometheus Python RabbitMQ Rust
11 hours, 1 minute ago

Senior Site Reliability Engineer, Identity Platform

Coinbase 1K-5K Capital Markets

Coinbase is hiring an experienced Site Reliability Engineer to build and scale identity and access management tooling for its IT Operations Corporate Engineering team supporting cloud-based, security-first systems.

Ansible AWS Azure C# CI/CD Docker GCP Go Java Kubernetes Python Ruby Secrets Management Terraform
11 hours, 31 minutes ago

Site Reliability Engineer II

Backblaze 251-1K IT Services

Backblaze is hiring a Site Reliability Engineer II to help ensure the stability, scalability, and reliability of its cloud storage services and infrastructure.

Ansible AWS Azure Bash CI/CD Docker GCP Go Grafana Jenkins Kubernetes Linux Microservices Prometheus Python Terraform
12 hours, 16 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers