Remote

Remote

Global HR Solutions & Employment Tools for Distributed Teams | Remote Hire international talent in minutes. Remote is the most disruptive global payroll, tax, HR and compliance solution for distributed teams. The easier way to employ internationally 🌍....

Professional Services
251-1K
Founded 2019
$496M raised

Description

  • Lead solution discovery and delivery for complex reliability and infrastructure problems with significant ambiguity or scope.
  • Own the plan and execution of features and projects within the SRE/Platform domain.
  • Contribute to the platform architecture, tooling, and roadmap, and influence team priorities.
  • Define and operate reliability practices such as SLOs, SLIs, error budgets, alerting, and observability.
  • Use support and incident metrics to shape technical strategy and improve operational posture.
  • Resolve cross-team requests, identify systemic issues, and turn recurring problems into reusable fixes and runbooks.
  • Build and operationalize AI-native workflows, reusable prompts, and tooling that help the team ship faster and more safely.
  • Mentor less-senior engineers, provide actionable feedback, and participate in hiring, onboarding, and RFC discussions.
  • Collaborate with Security on platform hardening, threat mitigation, capacity planning, and cost efficiency.
  • Participate in incident response and on-call rotations to restore service quickly and maintain reliability.

Requirements

  • Solid professional experience in SRE, DevOps, or Platform Engineering.
  • Hands-on experience operating and scaling Kubernetes production clusters and Docker/container tooling.
  • Experience building and managing cloud infrastructure on AWS or a similar provider.
  • Strong infrastructure-as-code experience with Terraform.
  • Experience with reliability frameworks including SLOs, SLIs, error budgets, and alerting strategies.
  • Strong observability background with tools such as OpenTelemetry and Grafana/Prometheus or similar.
  • Experience with CI/CD and deployment automation using GitLab CI, GitHub Actions, or similar.
  • Proficiency with Golang and Bash/scripting; broader programming experience is a plus.
  • Practical, embedded use of AI in infrastructure, operations, or development work with observable results.
  • Clear communication in an async-first, global environment.
  • Proactive, curious, and comfortable taking ownership of challenges.
  • Collaborative and respectful across cultures, time zones, and backgrounds.
  • Experience with at least one backend language such as Elixir, Node.js, or Python is preferred.
  • Experience running and configuring Linux systems in a non-cloud environment is preferred.
  • Security knowledge and capabilities from both defensive and offensive perspectives are preferred.
  • Location prioritization for Europe for this hire.
  • Ability to start as soon as possible.

Benefits

  • Annual salary range of $53,300 to $119,850 USD.
  • Fair, unbiased compensation with equity pay and above in-location rates.
  • Fully remote work from anywhere.
  • Flexible paid time off.
  • Flexible working hours in an async work environment.
  • 16 weeks of paid parental leave.
  • Mental health support services.
  • Stock options.
  • Learning budget.
  • Home office budget and IT equipment.
  • Budget for local in-person social events or co-working spaces.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Sr. IT Linux Site Reliability Engineer

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Sr. Linux Site Reliability Engineer to support its Linux Infrastructure team in designing, maintaining, scaling, and optimizing Kubernetes-based platforms for critical business operations.

Ansible Argo CD CI/CD CRI-O Docker Git GitOps Go Grafana Helm InfluxDB Jenkins Kubernetes Linux Prometheus Puppet Python REST API SVN Terraform Vagrant YAML
4 hours, 32 minutes ago

Senior Site Reliability Engineer

Remote 251-1K Professional Services

Remote is hiring a Senior SRE to own reliability and platform initiatives for its fully remote, async-first global engineering team.

AWS Bash CI/CD Docker Elixir GitHub Actions GitLab CI Go Grafana Kubernetes Linux Node.js OpenTelemetry Prometheus Python Terraform
5 hours, 2 minutes ago

Field Reliability Engineer- LATAM

Honeycomb.io 51-250 Internet Software & Services

Honeycomb is hiring a Platform Engineering professional to own managed services and infrastructure operations for customer-facing deployments across AWS and Kubernetes environments.

AWS Helm Kubernetes Microservices OpenTelemetry Serverless Terraform
5 hours, 2 minutes ago

Staff Reliability Engineer (Full Stack)

Feeld 51-250 Family Services

Feeld is hiring a Staff Reliability Engineer (Full Stack) to improve the reliability and operability of its production backend and mobile-integrated systems within a distributed Platform team.

Agile AWS CI/CD Node.js PostgreSQL React Native Redis TypeScript
5 hours, 32 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers