Staff ML Engineer - ML Infrastructure

1 hour, 16 minutes ago
Full-time
Lead
DevOps and Infrastructure
Samsara

Samsara

Samsara pioneers the Connected Operations Cloud, offering AI safety programs, real-time visibility, and integrations for industries to enhance efficiency, safety, and sustainability globally.

IT Services
1K-5K
Founded 2015

Description

  • Design, build, and operate Samsara’s end-to-end ML platform across training, experimentation, batch and online inference, and edge deployment.
  • Evolve shared training and experimentation infrastructure, including orchestration, clusters, environments, tracking, evaluation, and regression testing.
  • Partner with product and applied ML teams to ship ML-powered features that improve safety, reliability, and cost efficiency.
  • Lead throughput and cost modeling for new ML features to support capacity planning, roadmap decisions, and go/no-go calls.
  • Drive experiment design and evaluation, including success metrics, A/B tests, offline tests, and interpretation of results.
  • Design and operate scalable online and batch inference systems with observability, SLOs, and unified training-to-production workflows.
  • Work with firmware and edge teams to package, validate, and deploy models to devices and create feedback loops from edge to cloud.
  • Own reliability, observability, security, incident response, and infrastructure hardening for ML systems across cloud and edge.
  • Provide Staff+/Senior Staff technical leadership, influence cross-team architecture decisions, and mentor engineers and applied scientists.
  • Improve developer experience through documentation, office hours, best practices, and open source participation.

Requirements

  • 10+ years of overall experience in machine learning engineering or a related field, with experience building and operating large-scale ML systems.
  • Strong experience with distributed computing frameworks such as Ray and/or Spark.
  • Hands-on experience with cloud infrastructure on AWS, containers/Kubernetes, and production observability tooling.
  • Proven experience building or supporting ML platforms for training, experimentation, or inference used by multiple teams.
  • Solid understanding of ML fundamentals, including evaluation, experiment design, and model iteration in production environments.
  • Experience shipping ML-powered features end-to-end with measurable product or business impact (preferred).
  • Background in computer vision and/or LLM-based systems in production environments (preferred).
  • Experience with edge or on-device ML and collaboration with firmware or embedded teams (preferred).
  • Familiarity with model lifecycle systems such as model registry, deployment, monitoring, rollback, and drift detection (preferred).
  • Experience working in environments with strong security and compliance requirements, and ability to lead across teams at Staff+ scope (preferred).

Benefits

  • Annual base salary range of $200,200 to $357,500 USD.
  • Eligible for an initial RSU grant with no vesting cliff.
  • Ongoing RSU refresh opportunities tied to performance, subject to plan terms and conditions.
  • Above-market total compensation, including base salary, performance-based bonus/variable pay, and equity for eligible roles.
  • Flexible, employee-led remote work model.
  • Comprehensive health plans.
  • Parental leave plans.
  • Professional development stipend.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Machine Learning Engineer II - Behavioral Security Products

Abnormal AI Internet Software & Services

Abnormal AI is hiring a Machine Learning Engineer for its Account Takeover Detection team to build and improve production ML systems that detect malicious activity and prevent account takeover attacks.

Apache Airflow Apache Spark AWS Azure Cybersecurity LLM Machine Learning MLOps Pandas Python PyTorch Scikit-learn SQL TensorFlow
16 minutes ago

Senior ML-Engineer

Fundraise Up 51-250 Capital Markets

Fundraise Up is hiring a Senior ML Engineer in Poland to build and optimize production ML and LLM solutions for its enterprise fundraising platform serving global nonprofit clients.

Apache Airflow CatBoost CI/CD ClickHouse Docker FastAPI Git Grafana LLM Machine Learning MLflow MongoDB NLP Pandas Python Redis Reinforcement Learning SQL
46 minutes ago

Staff AI Engineer - Notebooks

Datadog 5K-10K IT Services

Datadog is hiring a Staff Software Engineer to lead AI/ML-driven development for Notebooks, building intelligent workflows that support natural-language data analysis, investigations, and documentation.

Machine Learning
1 hour, 16 minutes ago

Machine Learning Engineer, Community Support Engineering

Airbnb 5K-10K Hotels, Restaurants & Leisure

Airbnb’s Community Support Products Machine Learning team is hiring an ML engineer in China to build generative AI and large language model solutions that improve customer support experiences for guests and hosts.

Generative AI Machine Learning
2 hours, 16 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers