Principal Engineer, Compute Platform

1 hour, 44 minutes ago
Full-time
Lead
DevOps and Infrastructure
Pinterest

Pinterest

Pinterest is the world's first visual discovery engine, offering a vast dataset of ideas with over 200 billion recipes, home hacks, and style inspiration. With a mission to inspire everyone to create a life they love, Pinterest empowers its employees t...

Internet Software & Services
5K-10K
Founded 2010

Description

  • Lead the consolidation and modernization of Pinterest’s shared compute infrastructure under PinCompute.
  • Design and implement Kubernetes-based solutions that scale for large, stateful, and data-intensive workloads.
  • Replace isolated dedicated compute pools with a large-scale shared, container-based compute platform.
  • Work with platform leads and internal customers to define features, migration paths, and platform requirements.
  • Increase platform utilization through workload stacking, bin packing, oversubscription, and related efficiency techniques.
  • Lead engineering discussions and decisions on design, execution, trade-offs, observability, performance, and operability.
  • Evolve the platform toward a multi-cloud abstraction layer that supports workloads across cloud providers.
  • Partner on capacity planning, cost visibility, instance-type fungibility, and infrastructure efficiency.
  • Drive delivery of GPU resources through the platform to support AI workloads.
  • Use AI tools to accelerate migrations, improve self-service, and apply AI to operational troubleshooting and root-cause analysis.
  • Set high standards for production quality and engineering excellence across the platform team.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • 12+ years of relevant industry experience building large-scale production distributed systems.
  • 5+ years of experience with Kubernetes in production.
  • Experience working across SWE and SRE or Production Engineering teams to deliver robust production systems.
  • Experience running distributed data systems and migrating them to Kubernetes is highly preferred.
  • Ability to work with cross-functional partners across multiple organizations.
  • Passion for automation, reducing toil, and building effective tooling.
  • Experience with stateful workloads and GPU-heavy AI workloads is relevant to the role.
  • Ability to operate in ambiguous situations with evolving workload, production, and multi-tenancy requirements.
  • Experience with cloud infrastructure and multi-cloud environments is beneficial.

Benefits

  • Base salary range of $242,634 to $499,541 USD for US-based applicants.
  • Eligible for equity.
  • Remote-eligible role with in-office collaboration required only 1-2 times per quarter.
  • Can be situated anywhere in the country.
  • No relocation assistance provided.
  • Access to Pinterest’s benefits package and company culture resources.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Principal Software Engineer - (Platform & Applications - CloudApps)

Motional 1K-5K Automotive

Motional is seeking a Principal Engineer for Cloud Applications in Singapore to architect and scale fleet management platforms for autonomous vehicles as the company transitions to global commercial operations.

AWS Azure C++ CI/CD ClickHouse Embedded Systems GCP Go IoT Python React
29 minutes ago

Director, AI Platforms

SoFi 1K-5K Capital Markets

SoFi is seeking a Director, AI Platforms to build and lead internal AI and SDLC platform services that enable secure, scalable AI development and deployment across the company.

AWS CI/CD Kubernetes
29 minutes ago

Machine Learning Engineer (Infra), Driver Understanding and Evaluation

Waymo Autonomous vehicles, robotics, AI, ride-hailing / mobility tech

Waymo is hiring a Machine Learning engineer or researcher for its DUE team to build scalable ML and data systems that improve evaluation, simulation workflows, and developer tooling for autonomous driving.

Machine Learning PyTorch TensorFlow
1 hour, 38 minutes ago

MLOps Platform Engineer

dv01 51-250 IT Services

dv01 is seeking a Senior AI Infrastructure / Platform Engineer to build and operate the cloud and DevOps foundations for AI, MLOps, and agentic systems in its structured finance data platform.

CI/CD Cloudflare GCP GitHub Actions Go Kubernetes MLOps Pulumi PyTorch Secrets Management Terraform
1 hour, 59 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers