Pinterest

Pinterest

Pinterest is the world's first visual discovery engine, offering a vast dataset of ideas with over 200 billion recipes, home hacks, and style inspiration. With a mission to inspire everyone to create a life they love, Pinterest empowers its employees t...

Internet Software & Services
5K-10K
Founded 2010

Description

  • Design and build AI agents that support production reliability work, including service health analysis, recommendations, migration playbooks, and risk identification.
  • Lead large-scale infrastructure modernization efforts, including Kubernetes adoption and platform transitions.
  • Transform consulting engagements and reliability patterns into reusable platforms, tools, automation, and self-service documentation.
  • Build the knowledge infrastructure for operational agents, including runbooks, incident patterns, migration playbooks, and best practices.
  • Develop software solutions that improve the reliability and operability of large-scale distributed systems.
  • Create tools, frameworks, and automation that reduce operational toil and overhead.
  • Develop meaningful SLIs that provide actionable signals of system health.
  • Automate critical engineering processes to improve deployment safety and speed at scale.
  • Partner with teams to plan and optimize capacity across public and private cloud environments.

Requirements

  • 5+ years of industry experience building and operating large-scale, high-performance distributed systems.
  • Bachelor's degree in Computer Science or related field, or equivalent experience.
  • Strong programming skills in Python or Go.
  • Deep knowledge of Linux/Unix internals.
  • Experience with open source infrastructure such as MySQL, Kafka, Envoy, or Hadoop.
  • Infrastructure as Code experience with tools such as Terraform, Puppet, Chef, Ansible, Docker, or Kubernetes.
  • Experience deploying web applications to cloud infrastructure such as AWS, GCP, or Azure.
  • Experience working with distributed, service-oriented architecture.
  • Preferred: experience developing AI agents for infrastructure automation, operational decision-making, or reliability workflows.
  • Preferred: AI/ML infrastructure experience, including LLM-based systems, model serving, or agentic workflows.
  • Preferred: technical consulting or embedded SRE experience with cross-functional engineering teams.

Benefits

  • Base salary range of $139,764 to $287,749 USD for US-based applicants.
  • Eligible for equity.
  • Remote-friendly working model with in-office collaboration required only 1-2 times every 6 months.
  • No relocation assistance is provided for this role.
  • Access to Pinterest culture and benefits information via the company benefits page.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff AI Engineer

Acquia 1K-5K Internet Software & Services

Acquia is hiring a Staff AI Engineer to join its AI Core Engineering team and build production-grade agentic AI workflows for its enterprise digital experience platform.

AWS Azure CI/CD Drupal GCP Python
1 hour, 38 minutes ago

NoSQL Database Engineer II

LivePerson 1K-5K Internet Software & Services

LivePerson is hiring a NoSQL Database Engineer (L2) in India to support production reliability and platform engineering for large-scale NoSQL systems and cloud infrastructure.

Bash Cassandra Couchbase GCP Go Grafana Prometheus Python Redis Terraform
1 hour, 53 minutes ago

Senior Network Site Reliability Engineer

Miro 1K-5K Internet Software & Services

Miro is hiring a Senior Network Site Reliability Engineer to strengthen the reliability, availability, and scalability of its AWS-based production infrastructure.

Agile AWS Azure Bash CI/CD DNS EC2 GCP GitHub GitLab Kubernetes Linux Python TCP/IP Terraform
2 hours, 8 minutes ago

Sênior Site Reliability Engineer - Network

Harford County Public Library 51-250 Diversified Consumer Services

Stone Tech, da Stone Co., busca um Senior Site Reliability Engineer - Network para liderar projetos críticos de infraestrutura de redes e evoluir a arquitetura global de conectividade do grupo.

Ansible API Gateway AWS Azure Cisco Datadog Fortinet GCP Kong Palo Alto Prometheus SIEM Splunk Terraform Zabbix
2 hours, 23 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers