CloudLinux

CloudLinux

CloudLinux is a leading provider of the CloudLinux OS, a platform for Linux web hosting that offers next-level performance and security. With a focus on optimizing web hosting environments, CloudLinux helps service providers improve density, stability,...

IT Services
51-250
Founded 2009

Description

  • Design and implement a self-service DBaaS platform using Terraform and Ansible for deploying highly available PostgreSQL, ClickHouse, MongoDB, and Redis clusters.
  • Build and operate database infrastructure across bare metal, OpenNebula, Kubernetes, and public cloud environments.
  • Manage and scale large ClickHouse analytics clusters, including sharding, replication, table engine optimization, and S3 backup pipelines.
  • Maintain and scale Apache Airflow and Redash infrastructure to support reliable ETL pipelines and analytics workflows.
  • Implement SRE practices for data management, including automated self-healing and defined SLO/SLI for databases.
  • Lead migration from legacy database solutions to modern cloud-native patterns and help evaluate Kubernetes operators for stateful workloads.
  • Serve as a technical authority for product teams on data schema design and SQL query optimization for high-load systems.
  • Collaborate with infrastructure and analytics teams to improve reliability, observability, and performance across the data platform.
  • Automate infrastructure and operational tasks with code to reduce manual intervention and repeat work.

Requirements

  • 5+ years of deep PostgreSQL experience, including MVCC internals, locking mechanics, Patroni, PgBouncer, and major version upgrades under load.
  • Proven experience operating large ClickHouse clusters, including ZooKeeper or ClickHouse Keeper, sharding, replication internals, and performance troubleshooting.
  • Strong Terraform and Ansible experience, including writing complex modules and roles.
  • Programming experience in Python or Go for infrastructure and automation is a major plus.
  • Experience working in hybrid environments across bare metal, Kubernetes, and cloud platforms.
  • Understanding of database performance tuning, including NVMe and network storage optimization.
  • Systems-level thinking across networking, infrastructure, and application logic.
  • Knowledge of security and disaster recovery practices, including FIPS and audit logs.
  • Preferred experience building an Internal Developer Platform (IDP).
  • Preferred experience operating databases in Kubernetes using CloudNativePG or Altinity Operator.
  • Preferred experience working for cloud or hosting providers in similar service environments.

Benefits

  • Fully remote work with flexible working hours and the ability to work from anywhere worldwide.
  • Paid 24 days of vacation per year.
  • 10 days of national holidays.
  • Unlimited sick leave.
  • Private medical insurance coverage.
  • Co-working and gym/sports reimbursement.
  • Budget for education, training, and conferences.
  • Opportunity to be rewarded for innovative ideas that the company can patent.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
3 hours, 38 minutes ago

OFSAA - Basel Technical Consultant

Unison Group Technology consulting

An experienced OFSAA Basel Technical Consultant is needed to design, develop, and support Basel regulatory reporting solutions for Oracle Financial Services Analytical Applications at a banking environment.

3 hours, 38 minutes ago

Oracle Security & Controls consultant 6 Months Contract

Belmont Lavan 11-50 Professional Services

Belmont Lavan Ltd is hiring an Oracle Security & Controls Consultant for a 6-month contract to assess, design, and implement security controls across Oracle environments that support data integrity, confidentiality, and regulatory compliance.

Oracle
3 hours, 53 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
3 hours, 53 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers