Capital Rx

Capital Rx provides comprehensive health benefit management and transparent pharmacy benefit management solutions, integrating various healthcare services to support millions of plan members across diverse sectors.

Health Care Providers & Services

Health Care

251-1K (450)

Founded 2017

48 open positions

Links

View All Jobs

Senior Scalability Engineer - Observability

1 month, 4 weeks ago

United States

Full-time

Lead

Platform Engineer

Software Development

Agile AWS ClickHouse Datadog Elasticsearch Flask Git Grafana Jaeger Lucene Microservices New Relic OpenSearch OpenTelemetry Prometheus Python React Rust Scrum Splunk SQL SQLAlchemy Terraform TypeScript Zipkin

Apply Now

Capital Rx

Health Care Providers & Services

251-1K

Founded 2017

View All Jobs 48

Description

Define, own, and build the company-wide observability strategy, tooling, and platform products.
Architect, implement, and maintain the LGTM stack across engineering teams.
Build production-grade internal observability products with React/TypeScript frontends and Python/Rust backends.
Develop high-performance log indexing and search solutions for large-scale log data.
Design and implement SQL-based analytics workflows for ad hoc and historical log analysis.
Integrate AWS observability services with the custom observability platform to provide unified visibility.
Create dashboards, monitors, and alerting systems that reduce noise and detect anomalies.
Partner with engineering teams to establish logging, metrics, and tracing standards and instrument services effectively.
Lead workshops, create documentation, and build self-service tooling to drive observability adoption.
Mentor engineers, lead architecture reviews, and represent the Scalability team in cross-functional planning.

Requirements

10+ years of software engineering or infrastructure engineering experience with progression into technical leadership roles.
Several years of experience leading technical initiatives, building platform products, or serving as an observability subject matter expert.
Strong experience with React/TypeScript for frontend development and Python (Flask/SQLAlchemy) for backend services.
Deep production experience with the LGTM stack, including Loki, Grafana, Tempo, and Prometheus/Mimir.
Extensive experience with AWS CloudWatch Logs and Metrics, including custom metrics, log insights, dashboards, and integrations.
Production experience with SQL-based log analytics using AWS Athena, DuckDB, or similar query engines.
Demonstrated ability to architect solutions using both managed cloud services and open-source tooling.
Hands-on experience with search and indexing systems such as OpenSearch, Elasticsearch, Lucene, or Tantivy.
Experience building high-performance systems that process millions of log lines or high-cardinality metrics.
Deep understanding of distributed systems and microservices architectures, and the observability challenges they create.
Proven track record handling high-volume structured and unstructured logging data and building efficient search/query solutions.
Ability to build internal platform products with strong attention to UX, performance, and reliability.
Production experience with Rust for high-performance data processing, indexing, or search systems preferred.
Experience with Terraform for observability infrastructure and AWS resources preferred.
Experience with Datadog, New Relic, Splunk, or other enterprise observability platforms preferred.
Deep expertise with PromQL, LogQL, SQL optimization, and query optimization for high-cardinality data preferred.
Experience with Parquet, ORC, or other columnar storage formats for S3-based analytics preferred.
Experience designing incident response workflows, postmortems, and SLO/SLI frameworks preferred.
Track record of reducing observability costs while maintaining or improving capabilities preferred.
Experience with streaming data pipelines, ETL, or real-time data processing preferred.
Deep knowledge of OpenTelemetry, Jaeger, Zipkin, or distributed tracing architectures preferred.
Git expertise and experience working in a monorepo preferred.
Previous PBM or healthcare technology experience preferred.
Experience building developer tools or internal platforms that improve engineering productivity preferred.

Benefits

Remote work location.
Salary range of $160,000 to $220,000 USD.
Equal employment opportunity and a workplace committed to diversity and inclusion.
Privacy notice and retention of application data for future consideration.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Canada Full-time Lead Infrastructure Engineer Site Reliability Engineer (SRE)

$86k-$127k

Ansible DNS Linux Puppet Python TCP/IP Unix

4 hours, 40 minutes ago

Apply

4 hours, 40 minutes ago

Platform Engineer

Ometria 51-250 Media

Ometria is hiring a remote Platform Engineer in Portugal to help build, scale, and maintain the cloud-based infrastructure and platform that supports its retail customer data and experience product.

Portugal Full-time Mid Level Platform Engineer

AWS CI/CD DevSecOps Docker Go Kafka Kubernetes Microservices PostgreSQL Python React Terraform

4 hours, 40 minutes ago

Apply

4 hours, 40 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

United States Full-time Lead Site Reliability Engineer (SRE)

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server

4 hours, 55 minutes ago

Apply

4 hours, 55 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Egypt Full-time Lead QA Engineer Site Reliability Engineer (SRE)

Azure CI/CD Kubernetes PowerShell

5 hours, 10 minutes ago

Apply

5 hours, 10 minutes ago

Capital Rx

Tags

Links

Senior Scalability Engineer - Observability

Capital Rx

Description

Requirements

Benefits

Similar Roles

Staff Operations Engineer

Platform Engineer

Principal Site Reliability Engineer (SRE)

Performance Test Engineer Lead

You're on a roll! Sign up now to keep applying.