TherapyNotes

TherapyNotes

TherapyNotes is a comprehensive practice management software designed for behavioral health practitioners. It offers a range of features including secure record management, appointment scheduling, note writing, and billing services. Developed by a husb...

Health Care Providers & Services
51-250
Founded 2010

Description

  • Design, implement, and maintain high-availability PostgreSQL database systems for a growing 24x7 SaaS platform.
  • Improve database service reliability through monitoring, alerting, SLO-oriented metrics, and operational readiness.
  • Lead and support incident response, root cause analysis, and post-incident corrective actions for database production events.
  • Partner with developers, operations, infrastructure, and technical leaders to ensure new systems are supportable and maintainable.
  • Provide escalated technical guidance and support to other technology teams across the organization.
  • Provide on-call coverage for production support and related duties as required.
  • Ensure database platform activities comply with HIPAA security policies and organizational security standards.
  • Build and continuously improve database observability dashboards, alerts, and service-level views in Datadog or similar tools.
  • Automate maintenance tasks and manage infrastructure as code using tools such as Bash, PowerShell, Python, Ansible, and Terraform.

Requirements

  • BS degree in Information Systems, Engineering, or equivalent experience.
  • 7-10+ years of engineering experience in Database Engineering, Systems Engineering, DevOps, and/or SRE.
  • Strong skill set in managing PostgreSQL in a Linux environment.
  • Experience with cloud-based compute, storage, and containerization solutions; Azure and Kubernetes preferred.
  • Expertise with an observability/monitoring platform such as Prometheus, Grafana, New Relic, or Datadog.
  • Experience operating production services in Agile/DevOps environments and using ITSM practices where applicable.
  • Ability to work with PostgreSQL ecosystem components such as PgBouncer, PgBackrest, HAProxy, and RepMgr is a plus.
  • Experience writing and designing ETL pipelines using Python is a plus.
  • Some exposure to Terraform is a plus.
  • Excellent communication and interpersonal skills.

Benefits

  • Competitive salary of $120,000-$160,000.
  • Employer-sponsored health, dental, vision, life, and disability insurance.
  • Retirement plan with company contribution.
  • Annual company profit sharing.
  • Personal development and training budget.
  • Open, collaborative work environment.
  • Extensive 2-week onboarding plan.
  • Comprehensive mentorship program.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Site Reliability Engineer, Production Engineering

Dropbox 1K-5K Internet Software & Services

Dropbox is hiring a Site Reliability Engineer to shape company-wide reliability strategy for AI-assisted and agentic software development while improving stability, observability, incident response, and operational excellence at scale.

1 hour, 55 minutes ago

Sr. Site Reliability Engineer III (6448)

MetroStar 251-1K IT Services

MetroStar is hiring a Sr. Site Reliability Engineer III to support mission-critical federal government systems by ensuring reliable, secure, and scalable application operations across modern infrastructure environments.

Ansible AWS Bash CI/CD Kubernetes Load Balancing
2 hours, 5 minutes ago

Senior Site Reliability Engineer

Honeycomb.io 51-250 Internet Software & Services

Honeycomb is hiring a Site Reliability Engineering professional to help scale backend systems, improve reliability, and support distributed engineering operations for a fast-growing observability platform.

AWS CI/CD Go Helm Kafka Kubernetes Terraform
2 hours, 5 minutes ago

Senior Production Engineer

Veeam Software 1K-5K Internet Software & Services

Veeam is hiring a Senior Production Engineer to design and operate reliable, scalable production systems for its Data Cloud platform and to lead improvements in incident response, automation, observability, and operational excellence.

Azure C# CI/CD Elasticsearch Go Grafana Java JavaScript OpenTelemetry Prometheus TypeScript
2 hours, 5 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers