Pinterest

Pinterest

Pinterest is the world's first visual discovery engine, offering a vast dataset of ideas with over 200 billion recipes, home hacks, and style inspiration. With a mission to inspire everyone to create a life they love, Pinterest empowers its employees t...

Internet Software & Services
5K-10K
Founded 2010

Description

  • Ensure the reliability, availability, and performance of production infrastructure and platform services.
  • Operate and scale Kubernetes platforms, including support for multi-tenant workloads.
  • Manage GitOps deployment workflows using ArgoCD and Helm.
  • Support infrastructure provisioning and change management with Terraform and Terragrunt.
  • Build and maintain CI/CD automation and deployment workflows using GitHub Actions.
  • Participate in incident response, root cause analysis, and post-incident improvements.
  • Reduce operational toil through scripting, tooling, and process automation.
  • Improve observability across logs, metrics, traces, dashboards, and alerting.
  • Support secure secrets integration, IAM-aware operations, and platform guardrails.
  • Partner with application, security, and platform teams to improve reliability and delivery outcomes.

Requirements

  • 4+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Cloud Infrastructure.
  • Strong hands-on experience operating AWS in production environments.
  • Strong Kubernetes experience, including cluster operations, troubleshooting, workload reliability, and platform administration.
  • Experience with Kubernetes multi-tenancy, including namespaces, RBAC, quotas, policies, and tenant isolation patterns.
  • Experience implementing and operating ArgoCD in a GitOps delivery model.
  • Strong hands-on experience with Helm.
  • Experience with Terraform or Terragrunt for infrastructure provisioning and environment management.
  • Solid scripting and automation skills using Bash and/or Python.
  • Experience building, maintaining, or supporting CI/CD pipelines, ideally using GitHub Actions.
  • Strong troubleshooting skills across Linux, containers, IAM, networking, and distributed systems.
  • Experience with monitoring, alerting, and observability in production environments.
  • Demonstrated ownership mindset with experience handling incidents and resolving production issues.
  • Strong collaboration and communication skills across engineering, security, and platform teams.
  • Bachelor’s degree in computer science, engineering, a related field, or equivalent experience.
  • Demonstrated ability to use AI to improve speed and quality in day-to-day workflow for relevant outputs.
  • Strong track record of critically evaluating and verifying AI-assisted work through testing, source-checking, data validation, or peer review.
  • High integrity and ownership, including protecting sensitive data, avoiding over-reliance on AI, and remaining accountable for final decisions and deliverables.

Benefits

  • Base salary range of $114,297 to $235,319 USD for US-based applicants.
  • Eligible for equity.
  • Flexible working model through PinFlex, with in-office needs varying by role and department.
  • No relocation assistance for this position.
  • Access to Pinterest’s culture and benefits information for the role.
  • Equal opportunity employer with accommodation support during the application process.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Manager, Software Engineering - Storage Platform

Figma 1K-5K Internet Software & Services

Figma is hiring an Engineering Manager to lead its Databases team, which owns the core data layer behind the company’s product and platform as it scales.

LLM MySQL PostgreSQL
7 hours, 7 minutes ago

Site Reliability Engineer

Stack AV 201-500 information technology & services

Stack AV is hiring a Site Reliability Engineer to keep its compute platform for large-scale autonomous systems development reliable, scalable, and ready to support engineering and research workloads.

CI/CD Kubernetes Linux OpenTelemetry Prometheus
7 hours, 22 minutes ago

Senior Site Reliability Engineer

Stack AV 201-500 information technology & services

Stack AV is hiring a Site Reliability Engineer to support the reliability, scalability, and uptime of its production infrastructure for autonomous trucking systems.

AWS Bash CloudFormation GCP Kubernetes Linux OpenTelemetry Prometheus Python TCP/IP Terraform
7 hours, 37 minutes ago

Manager of Monitoring Operations

Ensono 1K-5K IT Services

BMC is hiring a Manager – Monitoring Operations to lead enterprise monitoring for IT infrastructure and applications across on-prem OpenShift, network, and OS monitoring platforms.

Grafana Kubernetes Linux Prometheus
1 day, 6 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers