Senior Cloud Performance Engineer

1 hour, 42 minutes ago
Full-time
Senior
DevOps and Infrastructure
ClickHouse

ClickHouse

ClickHouse provides a fast open source column-oriented database management system that enables users to generate real-time analytical data reports through SQL queries, catering to the needs of industries requiring efficient data processing and analysis.

IT Services
51-250
Founded 2021
$300M raised

Description

  • Benchmark system and database performance, including capacity sizing and optimization.
  • Troubleshoot and debug application and server errors, logs, and related issues.
  • Recommend configuration tuning and optimizations to resolve performance bottlenecks.
  • Work closely with the core development, cloud, and security teams to improve ClickHouse Cloud performance.
  • Plan, enable, and drive chaos engineering initiatives across engineering teams.
  • Develop, deploy, and manage tools to run chaos experiments and measure their impact.
  • Extend backend systems to support chaos engineering techniques.
  • Observe running systems and identify innovative ways to disrupt them for resilience testing.
  • Study and improve software resilience, operational, and delivery practices.
  • Build and operate performance tooling for large-scale distributed systems.

Requirements

  • 6+ years of relevant software development experience building and operating scalable, fault-tolerant distributed systems.
  • Software development experience in Go, C/C++, Java, or similar languages.
  • Experience with concurrency, multithreading, and distributed system architectures.
  • Experience developing cloud infrastructure services, preferably with Kubernetes.
  • Experience leading and shipping large technical projects with multiple experienced engineers.
  • Experience with a public cloud provider such as AWS, GCP, or Azure, including infrastructure services like EC2.
  • Strong production debugging and problem-solving skills.
  • Excellent communication skills and ability to work well across engineering teams.
  • Passion for efficiency, availability, scalability, and data governance.
  • High level of responsibility, ownership, and accountability.

Benefits

  • Competitive compensation with location-based salary ranges.
  • Employer contributions toward healthcare.
  • Equity in the company through stock options for new hires.
  • Flexible time off, with generous entitlement in some countries.
  • Remote-friendly flexible work environment across more than 20 countries.
  • $500 home office setup allowance for remote employees.
  • Opportunities to connect through company-wide global gatherings and offsites.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Capital Markets Gateway 51-250 Capital Markets

Capital Markets Gateway LLC (CMG) is hiring a remote Site Reliability Engineer in Latin America to strengthen the reliability, performance, and observability of its capital markets fintech platform used by buy-side firms and investment banks.

Azure Bash Datadog Docker Elasticsearch GitHub Grafana GraphQL JIRA Kubernetes Linux Microservices .NET OpenTelemetry PostgreSQL Prometheus Python React Redis Terraform TypeScript
12 minutes ago

Staff Site Reliability Engineer (Platform Reliability)

Qonto 1K-5K Banks

Qonto is hiring a Staff Site Reliability Engineer to lead platform reliability work, shape infrastructure decisions, and help scale its cloud platform for millions of customers across Europe.

Argo CD AWS Docker Elasticsearch GitLab CI GitOps Go Kafka Kubernetes Microservices OpenTelemetry OpsGenie PostgreSQL Prometheus Python Redis Terraform
42 minutes ago

Incident Engineer

Netomi 51-250 IT Services

Netomi is hiring a remote Incident Engineer in Gurugram to manage end-to-end incident response for its enterprise AI customer experience platform and keep customer- and internal-facing systems running reliably.

AWS Datadog LLM
1 hour, 12 minutes ago

Sr. Site Reliability Engineer

Backblaze 251-1K IT Services

Backblaze is seeking a Senior Site Reliability Engineer to improve the stability, scalability, and reliability of its customer-facing cloud storage services and infrastructure.

Ansible AWS Azure Bash Docker ELK Stack GCP Go Grafana HashiCorp Vault Jenkins Kubernetes Linux Microservices Prometheus Python Terraform
3 hours, 27 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers