Staff ML Engineer - ML Infrastructure

11 hours, 46 minutes ago
Full-time
Lead
DevOps and Infrastructure
Samsara

Samsara

Samsara pioneers the Connected Operations Cloud, offering AI safety programs, real-time visibility, and integrations for industries to enhance efficiency, safety, and sustainability globally.

IT Services
1K-5K
Founded 2015

Description

  • Design, build, and operate Samsara’s end-to-end ML platform, including training, experimentation, batch and online inference, and edge workflows.
  • Evolve shared training and experimentation infrastructure and standardize tracking, evaluation, and regression testing for safe iteration.
  • Partner with product and applied ML teams to ship ML-powered features that improve safety, reliability, and cost efficiency.
  • Lead throughput and cost modeling for new ML features to support capacity planning and roadmap decisions.
  • Drive experiment design and evaluation, including defining success metrics and structuring A/B or offline tests.
  • Design and operate scalable online and batch inference systems with observability, SLOs, and unified training-to-production workflows.
  • Partner with firmware and edge teams to package, validate, and deploy models to devices and build cloud-to-edge feedback loops.
  • Own reliability, observability, and security for ML systems across cloud and edge, including incident response and infrastructure hardening.
  • Provide Staff+/Senior Staff technical leadership on ML infrastructure architecture and strategy, while mentoring engineers and applied scientists.
  • Improve developer experience through documentation, office hours, best practices, and open source contribution.

Requirements

  • 10+ years of overall experience in machine learning engineering or related fields.
  • Strong experience with distributed computing frameworks such as Ray and/or Spark.
  • Hands-on experience with AWS, containers/Kubernetes, and production observability tooling.
  • Proven experience building or supporting ML platforms used by multiple teams.
  • Solid understanding of ML fundamentals, including evaluation, experiment design, and model iteration in production.
  • Experience shipping ML-powered features end-to-end with measurable impact on product or business metrics (preferred).
  • Background in computer vision and/or LLM-based systems in production environments (preferred).
  • Experience with edge or on-device ML and collaboration with firmware or embedded teams (preferred).
  • Familiarity with model lifecycle systems such as model registry, deployment, monitoring, rollback, and drift detection (preferred).
  • Experience working in environments with strong security and compliance requirements (preferred).

Benefits

  • Annual base salary range of CAD $196,000 to $269,500.
  • Eligible for an initial RSU grant with no vesting cliff and ongoing refresh opportunities tied to performance.
  • Above-market total compensation that may include base salary, performance-based bonus/variable pay, and equity for eligible roles.
  • Flexible, employee-led remote work model.
  • Professional development stipend.
  • Comprehensive health plans.
  • Parental leave plans.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Machine Learning Engineer, AI Researcher

Cribl 251-1K IT Services

Cribl is hiring a remote-first machine learning engineer to help build AI-enabled security and observability products that solve real customer problems.

Computer Vision Feature Engineering Kubeflow Machine Learning MLflow MLOps NLP Python PyTorch Reinforcement Learning TensorFlow
11 hours, 16 minutes ago

Staff Machine Learning Engineer - Platform (Core AI Automation)

Coinbase 1K-5K Capital Markets

Coinbase is hiring a Machine Learning Engineer for its Core Automation Team to build AI infrastructure and automation that improve customer support, compliance operations, and AI-powered customer interactions on its onchain platform.

Apache Airflow Apache Spark Blockchain Computer Vision Databricks Deep Learning Flink Generative AI Kafka LLM Machine Learning NLP Python Snowflake
11 hours, 16 minutes ago

Software Engineer - ML Platform

Veriff 51-250 IT Services

Veriff’s ML Platform team is hiring a software or ML engineer to build the systems that support machine learning development, experimentation, observability, and scalable model deployment.

Apache Spark dbt Grafana Kubeflow MLflow MLOps Prometheus Python Snowflake SQL
11 hours, 16 minutes ago

Senior ML-Engineer

Fundraise Up 51-250 Capital Markets

Fundraise Up is hiring a Senior ML Engineer in Serbia to build and deploy ML and LLM solutions for its nonprofit fundraising platform serving multiple internal product teams.

Apache Airflow CatBoost CI/CD ClickHouse Docker FastAPI Git Grafana LLM MLflow MongoDB NLP Pandas Python Redis Reinforcement Learning SQL
12 hours, 46 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers