Samsara

Samsara

Samsara pioneers the Connected Operations Cloud, offering AI safety programs, real-time visibility, and integrations for industries to enhance efficiency, safety, and sustainability globally.

IT Services
1K-5K
Founded 2015

Description

  • Design and build automated reliability and self-healing systems that protect production at scale.
  • Own and improve incident management tooling and on-call health.
  • Develop and evolve observability infrastructure, including monitoring, alerting, SLOs, and performance regression detection.
  • Contribute to AI-driven operational tooling that supports autonomous remediation and self-recovery.
  • Identify systemic reliability patterns and eliminate operational toil to prevent incidents.
  • Partner with product engineering teams to diagnose reliability gaps and reduce operational burden.
  • Define and champion operational excellence practices, guardrails, scorecards, and standards across engineering.
  • Role model Samsara’s cultural principles as the company scales globally.

Requirements

  • 8+ years of experience designing and building products in a software engineering team.
  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
  • 3+ years of experience on infrastructure and/or platform engineering-focused teams.
  • Expertise in observability, reliability, operational metrics, and data analysis.
  • Experience architecting monitoring frameworks, SLO platforms, and automated response workflows using Datadog or equivalent tools such as New Relic or Grafana.
  • Experience with large-scale enterprise software applications.
  • Experience designing and implementing DevEx and internal portal solutions that centralize and simplify engineering operations.
  • Familiarity with cloud platforms such as AWS or GCP.
  • Experience implementing AI-driven automation across the SDLC and routinely using AI tools in workflow.
  • Experience writing high-quality code in Go, Python, or an equivalent language for infrastructure, deployment, and operations challenges.
  • Experience mentoring engineers and operating in a technical lead capacity.
  • Strong communication skills and ability to collaborate across teams (preferred).
  • Experience with incident management tooling such as Incident.io or PagerDuty (preferred).
  • Experience with Infrastructure as Code, especially Terraform (preferred).

Benefits

  • Annual base salary range of $154,700 to $208,000 USD.
  • Eligible for an initial RSU grant with no vesting cliff and ongoing refresh opportunities tied to performance.
  • Above-market total compensation including base salary, performance-based bonus or variable pay, and equity for eligible roles.
  • Flexible, employee-led remote work model.
  • Comprehensive health plans.
  • Parental leave plans.
  • Professional development stipend.
  • Accommodation support for candidates who need it during the recruiting process.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer - Canada Wide - Remote

Newton 51-250 Capital Markets

Newton is hiring a remote Site Reliability Engineer across Canada to improve the reliability, resilience, and operational readiness of its crypto trading platform.

AWS Java JavaScript Python
6 hours, 51 minutes ago

Site Reliability Engineer - India

Zimperium 251-1K Professional Services

Zimperium is hiring a Senior Site Reliability Engineer in India to improve the reliability, automation, and scalability of its mobile security production systems and applications.

CI/CD Datadog Docker Java Kubernetes Linux Python Unix
7 hours, 6 minutes ago

Senior Site Reliability Engineer

Block 10K-50K Capital Markets

Block is hiring an SRE to improve the reliability of its platform and critical infrastructure for Tier 0 services, with a focus on safe, scalable operations and system-wide incident reduction.

AWS CI/CD Datadog DynamoDB Envoy gRPC HTTP Java JSON Kotlin Kubernetes MySQL Terraform
7 hours, 6 minutes ago

Senior Site Reliability Engineer

Block 10K-50K Capital Markets

Block is hiring a Site Reliability Engineer to improve the reliability of its platform and critical infrastructure, with a focus on scalable distributed systems, incident response, and system-wide operational resilience.

AWS CI/CD Datadog DynamoDB Envoy gRPC HTTP Java JSON Kotlin Kubernetes MySQL Terraform
7 hours, 6 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers