Blink Health

Blink Health

Blink Health is a digital health company revolutionizing the prescription medication industry by providing affordable and accessible medications to millions of people across America. Their cloud-based pharmacy platform eliminates traditional roadblocks...

Health Care Providers & Services
251-1K
Founded 2014
$165M raised

Description

  • Establish and evolve SRE best practices across reliability, incident response, postmortems, error budgets, and operational readiness.
  • Define and drive the observability strategy, including SLIs/SLOs, alerting quality, dashboards, and service health indicators.
  • Design and implement software-driven infrastructure solutions that automate manual work and reduce operational toil.
  • Act as a technical leader and influence priorities across cloud infrastructure, reliability tooling, and platform architecture.
  • Own large, ambiguous initiatives from concept to delivery while aligning stakeholders across engineering, security, and product.
  • Improve platform resilience, scalability, performance, and compliance through infrastructure and security-focused engineering work.
  • Identify systemic risks and reliability gaps early and lead platform upgrades and architectural improvements.
  • Partner with engineering teams to improve developer workflows, tooling, and operational maturity.
  • Provide technical mentorship, architecture guidance, and high-quality design and code reviews.
  • Lead documentation and knowledge sharing so systems and processes are resilient to individual ownership.
  • Participate in and help mature incident response, escalation practices, and post-incident learning.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or equivalent practical experience.
  • 10+ years of experience in site reliability engineering, infrastructure engineering, or platform engineering roles with demonstrated impact at scale.
  • Expert-level troubleshooting across the full stack, from application to kernel to network.
  • Strong command-line proficiency and deep expertise in Linux systems and operating system fundamentals.
  • Advanced understanding of networking concepts including load balancing, proxies, DNS, TCP/IP, NAT, and service-to-service communication.
  • Experience with multiple languages such as Python, Go, and Bash, plus familiarity troubleshooting application stacks like React or similar.
  • Strong track record of automating repetitive and complex operational work to reduce toil and increase reliability.
  • Ability to design and build internal tools in Python or Go that standardize and scale engineering practices.
  • Deep experience with cloud platforms, preferably AWS, with GCP or Azure also acceptable.
  • Strong expertise in Kubernetes and container orchestration, including EKS and Helm.
  • Proven experience designing and implementing observability systems, including metrics, logging, tracing, dashboards, and alerting.
  • Deep understanding of container technologies, security scanning, secrets management, dynamic configuration, and microservices architectures.
  • Familiarity with service meshes and advanced traffic management concepts.
  • Experience designing and maintaining company-wide infrastructure-as-code codebases using Terraform, Pulumi, CloudFormation, or Ansible.
  • Ability to think holistically about infrastructure design, cost, reliability, security, and long-term maintainability.
  • Comfort operating in an agile environment with disciplined testing and quality practices.

Benefits

  • Opportunity to work on products that improve prescription access and affordability for millions of patients.
  • High-impact role at a fast-growing healthcare technology company.
  • Collaborative, cross-functional team environment.
  • Equal opportunity employer committed to diversity.
  • SMS or MMS application status updates for applicants who opt in.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
22 hours, 46 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 22 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 22 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 22 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers