Abacus Insights

Abacus Insights simplifies healthcare data with intelligent solutions, unlocking data value and empowering health plans, consumers, and providers.

Insurance

Financials

51-250 (150)

Founded 2017

$82M raised

5 open positions

Links

View All Jobs

Principal Site Reliability & Forward Deployed Engineer

3 weeks, 1 day ago

United States

Full-time

Lead

Site Reliability Engineer (SRE)

Software Development

Apache Spark AWS CI/CD Databricks Kubernetes Python Snowflake

Apply Now

Abacus Insights

Abacus Insights simplifies healthcare data with intelligent solutions, unlocking data value and empowering health plans, consumers, and providers.

Insurance

51-250

Founded 2017

$82M raised

View All Jobs 5

Description

Act as a senior technical escalation point during production incidents and customer-impacting issues.
Lead real-time incident triage, mitigation, recovery, and root cause analysis.
Own post-launch reliability, stability, and operational quality of core systems.
Investigate and resolve complex production defects, field issues, and escalations.
Translate fixes and incident learnings into durable product, platform, and operational improvements.
Support strategic customers with deployments, integrations, and production-grade technical challenges.
Troubleshoot AWS-hosted systems, including compute, storage, networking, IAM, and security.
Debug Databricks jobs, clusters, Spark-based pipelines, performance issues, scalability issues, and data correctness problems.
Write production-quality code and automation to improve reliability, observability, and operational efficiency.
Provide technical leadership, mentor engineers, and collaborate across Product, Engineering, Data, and Customer teams.

Requirements

10+ years of experience in software engineering, SRE, sustaining engineering, or production operations.
Deep hands-on experience operating production systems in AWS.
Strong experience troubleshooting Databricks and large-scale data platforms.
Proficiency in Python and experience building production services or tooling.
Strong understanding of distributed systems, incident management, RCA, monitoring, alerting, observability, and CI/CD pipelines using Infrastructure as Code.
Proven ability to own problems end-to-end from detection to permanent resolution.
Excellent communication skills, especially during incidents and customer escalations.
Ability to work backward from customer impact to root cause across systems and codebases in environments with minimal documentation.
Strong instinct for operational risk and proactively identifying failure modes before they impact customers.
Experience in healthcare, health insurance, or regulated data environments is preferred.
Familiarity with Kubernetes (EKS), EMR, Lambda, Spark internals, and Snowflake or similar data warehouses is preferred.
Experience with FHIR, MDM systems, or entity resolution is preferred.
Prior SWAT, escalation engineering, or tiger-team experience is preferred.
Experience contributing to or operating within SRE/on-call programs is preferred.

Benefits

Base salary plus eligibility for performance bonuses and equity grants.
Unlimited paid time off.
Work from anywhere flexibility.
Comprehensive health coverage with multiple plan options.
Equity for every employee.
Growth-focused environment with development support.
Home office setup allowance.
Monthly cell phone allowance.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Solutions Engineer - Financial Institutions

Yuno 51-200 Payment Processing Software

Yuno is hiring a Solutions Engineer to support banks and financial institutions across Europe, Italy, and Portugal by leading technical integrations and helping them adopt the company’s payment platform in complex regulated environments.

Europe Italy Portugal Full-time Lead Solutions Engineer

REST API

4 hours, 2 minutes ago

Apply

4 hours, 2 minutes ago

Forward Deployed Engineer (FDE)

Maneva 11-50 Automation Machinery Manufacturing

Maneva is hiring a Forward Deployed Engineer to implement and support AI-powered computer vision systems for manufacturing customers, with the goal of improving production quality and throughput through on-site technical ownership.

United States Canada Full-time Senior AI Engineer Solutions Engineer

AWS Azure Computer Vision Docker ERP GCP Git Linux MLOps Python TCP/IP

4 hours, 29 minutes ago

Apply

4 hours, 29 minutes ago

Senior Site Reliability Engineer

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Site Reliability Engineer for its Mission Autonomy team to support the reliability and operational excellence of autonomous systems used across cloud, hardware-in-the-loop, and air-gapped environments.

United States Full-time Senior Site Reliability Engineer (SRE)

$166k-$220k

Ansible AWS Azure DNS Docker GCP Go HTTP Kubernetes Linux Load Balancing Puppet Python Splunk TCP/IP Terraform

4 hours, 44 minutes ago

Apply

4 hours, 44 minutes ago

Senior Solutions Engineer | California | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring an Enterprise Solutions Engineer to partner with sales and customers on technical pre-sales, product education, and opportunity closure in a fast-growing, remote-first observability company.

United States Full-time Senior Solutions Engineer

$204k-$254k

Grafana

7 hours, 19 minutes ago

Apply

7 hours, 19 minutes ago

Abacus Insights

Tags

Links

Principal Site Reliability & Forward Deployed Engineer

Abacus Insights

Description

Requirements

Benefits

Similar Roles

Solutions Engineer - Financial Institutions

Forward Deployed Engineer (FDE)

Senior Site Reliability Engineer

Senior Solutions Engineer | California | Remote

You're on a roll! Sign up now to keep applying.