Spotify

Spotify

Spotify is a leading global audio streaming service empowering artists and inspiring fans worldwide with over 70 million tracks. It has 365 million users, including 165 million subscribers, across 178 markets.

Media
Founded 2006

Description

  • Own the reliability, security, and scalability strategy for Portal’s SaaS infrastructure and runtime environments.
  • Define service level objectives, drive capacity planning, and ensure systems meet product growth demands.
  • Design and evolve infrastructure on GCP and AWS using Terraform and infrastructure-as-code patterns.
  • Shape environment design for non-deterministic AI workloads, including sandboxing, resource isolation, cost governance, and security boundaries.
  • Evolve incident management, on-call, and postmortem practices to improve operational excellence.
  • Use AI assistants to speed up root cause analysis and develop self-healing capabilities in production systems.
  • Operate and troubleshoot reliability issues across a modern web stack including TypeScript, React, and Python.
  • Establish SRE best practices, run production-readiness reviews, and mentor engineers on operational thinking.
  • Partner with engineering and product leadership to translate operational insights into roadmap priorities.

Requirements

  • 5+ years of hands-on experience operating cloud infrastructure at scale using GCP and/or AWS, Terraform, and Kubernetes.
  • Practical experience or strong interest in LLM-based systems, RAG pipelines, or agentic workloads.
  • Understanding of reliability challenges in non-deterministic systems.
  • Distributed systems thinking grounded in consistency, availability, and partition tolerance.
  • Proficiency in at least one modern language: TypeScript, Java, Go, or Python.
  • Comfort navigating large, heterogeneous codebases, including environments with AI-generated pull requests.
  • Ability to build automation that removes recurring operational issues over time.
  • Strong communication skills for explaining infrastructure trade-offs to technical and non-technical stakeholders.
  • Experience writing postmortems that drive meaningful change.

Benefits

  • Base salary range of $164,448–$234,926 USD plus equity.
  • Health insurance.
  • Six-month paid parental leave.
  • 401(k) retirement plan.
  • Monthly meal allowance.
  • 23 paid days off.
  • Paid flexible holidays.
  • Paid sick leave.
  • Flexible remote work with some in-person meetings in New York, NY.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Zeta Global 1K-5K Media

Zeta Global is hiring a Senior Site Reliability Engineer to help build and operate scalable observability and reliability systems for high-throughput distributed services processing millions of transactions daily.

Argo CD AWS Docker GitOps Go Grafana Honeycomb Jenkins Kubernetes Microservices OpenTelemetry Prometheus Python Terraform
15 minutes ago

Senior SRE Engineer / DevOps

Margo Bank Professional Services

Senior SRE Engineer / DevOps position at a consulting team in Warsaw focused on developing an internal developer platform and establishing CI/CD standards across multiple teams.

Bash CI/CD DevSecOps Git Kubernetes Python
15 minutes ago

Senior Site Reliability Engineer (SRE)

KOMOJU Internet Software & Services

KOMOJU is hiring a Site Reliability Engineer to own the reliability, performance, and developer experience of its cloud-based payment platform supporting merchants across cross-border integrations.

AWS CI/CD CircleCI Datadog GitHub Actions Go Jenkins Python Ruby Ruby on Rails Shopify TCP/IP Terraform
30 minutes ago

DevOps & Site Reliability Engineer

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a DevOps & Site Reliability Engineer to support an AI-focused SaaS startup by maintaining, optimizing, and scaling the infrastructure behind its platform for high availability, performance, and reliability.

AWS Azure Azure Pipelines Bash CI/CD CircleCI Datadog Docker GCP Grafana Helm Jenkins Kubernetes New Relic Prometheus
45 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers