Blink Health

Blink Health

Blink Health is a digital health company revolutionizing the prescription medication industry by providing affordable and accessible medications to millions of people across America. Their cloud-based pharmacy platform eliminates traditional roadblocks...

Health Care Providers & Services
251-1K
Founded 2014
$165M raised

Description

  • Establish and evolve SRE best practices across reliability, incident response, postmortems, error budgets, and operational readiness.
  • Define and drive the observability strategy, including SLIs/SLOs, alerting quality, dashboards, and service health indicators.
  • Design and implement software-driven infrastructure solutions that automate manual work and reduce operational toil.
  • Act as a technical leader and influence priorities across cloud infrastructure, reliability tooling, and platform architecture.
  • Own large, ambiguous initiatives from concept to delivery while aligning stakeholders across engineering, security, and product.
  • Improve platform resilience, scalability, performance, and compliance through infrastructure and security-focused engineering work.
  • Identify systemic risks and reliability gaps early and lead platform upgrades and architectural improvements.
  • Partner with engineering teams to improve developer workflows, tooling, and operational maturity.
  • Provide technical mentorship, architecture guidance, and high-quality design and code reviews.
  • Lead documentation and knowledge sharing so systems and processes are resilient to individual ownership.
  • Participate in and help mature incident response, escalation practices, and post-incident learning.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or equivalent practical experience.
  • 10+ years of experience in site reliability engineering, infrastructure engineering, or platform engineering roles with demonstrated impact at scale.
  • Expert-level troubleshooting across the full stack, from application to kernel to network.
  • Strong command-line proficiency and deep expertise in Linux systems and operating system fundamentals.
  • Advanced understanding of networking concepts including load balancing, proxies, DNS, TCP/IP, NAT, and service-to-service communication.
  • Experience with multiple languages such as Python, Go, and Bash, plus familiarity troubleshooting application stacks like React or similar.
  • Strong track record of automating repetitive and complex operational work to reduce toil and increase reliability.
  • Ability to design and build internal tools in Python or Go that standardize and scale engineering practices.
  • Deep experience with cloud platforms, preferably AWS, with GCP or Azure also acceptable.
  • Strong expertise in Kubernetes and container orchestration, including EKS and Helm.
  • Proven experience designing and implementing observability systems, including metrics, logging, tracing, dashboards, and alerting.
  • Deep understanding of container technologies, security scanning, secrets management, dynamic configuration, and microservices architectures.
  • Familiarity with service meshes and advanced traffic management concepts.
  • Experience designing and maintaining company-wide infrastructure-as-code codebases using Terraform, Pulumi, CloudFormation, or Ansible.
  • Ability to think holistically about infrastructure design, cost, reliability, security, and long-term maintainability.
  • Comfort operating in an agile environment with disciplined testing and quality practices.

Benefits

  • Opportunity to work on products that improve prescription access and affordability for millions of patients.
  • High-impact role at a fast-growing healthcare technology company.
  • Collaborative, cross-functional team environment.
  • Equal opportunity employer committed to diversity.
  • SMS or MMS application status updates for applicants who opt in.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Operations Engineer

Mozilla 251-1K Internet Software & Services

Mozilla is hiring a Staff Operations Engineer to lead the design, reliability, and evolution of hybrid-cloud and workplace infrastructure across teams.

Ansible DNS Linux Puppet Python TCP/IP Unix
16 hours, 23 minutes ago

Principal Site Reliability Engineer (SRE)

Symmetrio Professional Services

Symmetrio is recruiting a Principal Site Reliability Engineer for a rapidly growing healthcare technology company to own the reliability, scalability, security, and performance of a mission-critical SaaS platform used by healthcare providers across the United States.

Active Directory AWS CI/CD Datadog Django Grafana Kubernetes Python Terraform Windows Server
16 hours, 38 minutes ago

Performance Test Engineer Lead

PartnerOne 51-250 Media

An enterprise performance engineering role at a cloud-focused organization, responsible for validating the scalability, stability, and production readiness of distributed systems across Azure and hybrid environments.

Azure CI/CD Kubernetes PowerShell
16 hours, 53 minutes ago

Site Reliability Engineer

MLabs 11-50 Internet Software & Services

Remote UK-hours Site Reliability Engineering role at a financial technology company, focused on automating and operating the infrastructure that supports global integration services for financial institutions.

Active Directory Ansible AWS CI/CD GCP OAuth PostgreSQL SAML
17 hours, 8 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers