Senior Database Reliability Engineer (DBRE) (worldwide remote)

1 hour, 45 minutes ago
Full-time
Senior
DevOps and Infrastructure
CloudLinux

CloudLinux

CloudLinux is a leading provider of the CloudLinux OS, a platform for Linux web hosting that offers next-level performance and security. With a focus on optimizing web hosting environments, CloudLinux helps service providers improve density, stability,...

IT Services
51-250
Founded 2009

Description

  • Own production PostgreSQL reliability, including HA design, Patroni, PgBouncer, replication, failover, upgrades, vacuum and bloat control, query tuning, locks, indexes, capacity, backups, PITR, and restore validation.
  • Improve disaster recovery by maintaining tested restores, documented recovery paths, measurable RTO/RPO targets, runbooks, and safe maintenance plans.
  • Support and troubleshoot the wider database estate, including ClickHouse, MongoDB, and Redis, while improving monitoring and access/data-safety controls.
  • Automate DBA workflows using Ansible, Terraform/OpenTofu, GitLab CI/CD, scripts, and reproducible runbooks for provisioning, grants, backups, restores, health checks, and ownership metadata.
  • Help build DBaaS-style self-service capabilities so engineering teams can request databases, access, credentials, and operational checks with less manual DBA intervention.
  • Improve observability and incident response through Grafana, metrics, logs, SLOs, alert rules, Opsgenie routing, and clear communication during production issues.
  • Work closely with engineering teams to reduce repeated DBA tickets and improve reliability, safety, and operational resilience.
  • Learn and operate the existing production ClickHouse environment safely and effectively.
  • Maintain clear documentation, evidence, and ownership for database incidents and recovery processes.

Requirements

  • Deep hands-on PostgreSQL experience in business-critical production environments, typically 5+ years or equivalent depth.
  • Strong understanding of PostgreSQL internals and operations, including MVCC, WAL, transactions, locks, indexes, query planning, replication, autovacuum, bloat, major upgrades, backups, PITR, and restore testing.
  • Proven experience with highly available databases and the ability to reason about quorum, split-brain risk, failover, rollback, and recovery.
  • Strong Linux and infrastructure fundamentals, including systemd, networking, storage, filesystems, CPU/memory/disk bottlenecks, TLS, DNS, firewalls, and root-cause troubleshooting.
  • Automation skills with Ansible and scripting.
  • Terraform/OpenTofu, GitLab CI/CD, and merge-request based delivery are strong advantages.
  • Ability to support more than one database engine and learn ClickHouse quickly even without day-one expertise.
  • Practical use of AI engineering assistants such as Claude and Codex, with careful personal verification of generated SQL, commands, scripts, and conclusions.
  • Clear written English for asynchronous work in Jira, Slack, GitLab, Slite, and runbooks.
  • ClickHouse operations experience, including replication, Keeper/ZooKeeper, MergeTree engines, distributed DDL, grants, row policies, backups, query troubleshooting, and cluster recovery, is preferred.
  • MongoDB replica sets and Percona Backup for MongoDB experience is preferred.
  • Redis/Sentinel and broker/cache failure mode experience is preferred.
  • Database observability, SLOs, golden signals, alert tuning, and executable incident runbooks are preferred.
  • Experience building internal platforms, self-service portals, or DBaaS workflows is preferred.

Benefits

  • Fully remote work with flexible working hours and the ability to work from any location worldwide.
  • Paid 24 days of vacation per year, plus 10 days of national holidays and unlimited sick leave.
  • Compensation for private medical insurance.
  • Co-working and gym/sports reimbursement.
  • Budget for education and professional development.
  • Opportunity to receive a reward for the most innovative idea that the company can patent.
  • Interesting and challenging projects with real production impact.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Associate SRE

66degrees 251-1K IT Services

66degrees is hiring a Site Reliability Engineer to support enterprise Google Cloud environments through reliability engineering, automation, and incident response for client workloads.

Agile Datadog GCP Kanban Kubernetes Linux Prometheus Python Scrum Terraform
4 hours, 28 minutes ago

Database Administrator - Cloud Platform / Infrastructure

3Cloud 251-1K Internet Software & Services

3Cloud is seeking an experienced Database Administrator to support multiple customer database migration and Azure data services projects across development, test, and production environments.

Azure Oracle SQL Server Terraform
4 hours, 54 minutes ago

Senior Cloud Database Engineer

Marqeta 251-1K Diversified Financial Services

Marqeta is hiring a Senior Cloud Database Engineer to own and evolve the databases behind its global card issuing platform, with a focus on automation, reliability, and high-availability database architecture.

AWS CI/CD Datadog DynamoDB Java Linux MongoDB MySQL PostgreSQL Python Shell Scripting Terraform
8 hours, 41 minutes ago

Clinical Data Associate II

Precision Medicine Group 251-1K Pharmaceuticals

The Clinical Data Associate II at Precision Medicine Group supports clinical trial data management activities from study start-up through database lock for assigned projects.

10 hours, 31 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers