Remote

Remote

Global HR Solutions & Employment Tools for Distributed Teams | Remote Hire international talent in minutes. Remote is the most disruptive global payroll, tax, HR and compliance solution for distributed teams. The easier way to employ internationally 🌍....

Professional Services
251-1K
Founded 2019
$496M raised

Description

  • Lead the discovery and delivery of reliability and infrastructure solutions for complex, ambiguous problems.
  • Own planning and execution of features and projects within the SRE/Platform domain.
  • Contribute to platform architecture, tooling, and roadmap decisions.
  • Define and operate reliability practices such as SLOs, SLIs, error budgets, alerting, and observability.
  • Resolve cross-team requests, identify systemic issues, and turn recurring issues into reusable fixes and runbooks.
  • Build and operationalize AI-native workflows, reusable prompts, skills, and tooling for the team.
  • Establish secure-by-default patterns, CI protections, and AI-assisted review practices.
  • Mentor less-senior engineers and provide timely, actionable feedback.
  • Participate in hiring, onboarding, and RFC discussions.
  • Collaborate with Security on platform hardening, threat mitigation, capacity, and cost-efficiency.
  • Participate in incident response and on-call rotations to maintain system reliability.

Requirements

  • Solid professional experience in SRE, DevOps, or Platform Engineering.
  • Hands-on experience operating and scaling Kubernetes production clusters and Docker/container tooling.
  • Experience building and managing cloud infrastructure on AWS or a similar cloud provider.
  • Strong infrastructure-as-code experience with Terraform.
  • Experience with reliability frameworks including SLOs, SLIs, error budgets, and alerting strategies.
  • Solid observability experience with OpenTelemetry, Grafana, Prometheus, or similar tools.
  • Experience with CI/CD and deployment automation, such as GitLab CI or GitHub Actions.
  • Comfort with Golang and Bash/scripting; broader programming experience is a plus.
  • Practical, embedded use of AI in infrastructure, operations, or development work with observable results.
  • Clear communication skills in an async-first, global environment.
  • Proactive, curious, and comfortable taking ownership of challenges.
  • Collaborative and respectful across cultures, time zones, and backgrounds.
  • Experience with one backend programming language such as Elixir, Node.js, or Python is preferred.
  • Experience running and configuring Linux systems in a non-cloud environment is preferred.
  • Security knowledge from both defensive and offensive perspectives is preferred.
  • Must submit application and CV in English.
  • Must upload a PDF CV or provide an up-to-date LinkedIn profile.

Benefits

  • Annual salary range of $53,300 to $119,850 USD.
  • Fair, unbiased compensation with equity pay.
  • Stock options.
  • Work from anywhere with a fully remote setup.
  • Flexible paid time off.
  • Flexible working hours in an async work environment.
  • 16 weeks of paid parental leave.
  • Mental health support services.
  • Learning budget.
  • Home office budget and IT equipment.
  • Budget for local in-person social events or co-working spaces.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Sr. IT Linux Site Reliability Engineer

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Sr. Linux Site Reliability Engineer to support its Linux Infrastructure team in designing, maintaining, scaling, and optimizing Kubernetes-based platforms for critical business operations.

Ansible Argo CD CI/CD CRI-O Docker Git GitOps Go Grafana Helm InfluxDB Jenkins Kubernetes Linux Prometheus Puppet Python REST API SVN Terraform Vagrant YAML
5 hours, 36 minutes ago

Senior Site Reliability Engineer

Remote 251-1K Professional Services

Remote is hiring a Senior SRE to own reliability and platform work for its fully remote global HR platform, helping translate ambiguous infrastructure challenges into robust solutions.

AWS Bash CI/CD Docker Elixir GitHub Actions GitLab CI Go Grafana Kubernetes Linux Node.js OpenTelemetry Prometheus Python Terraform
5 hours, 51 minutes ago

Field Reliability Engineer- LATAM

Honeycomb.io 51-250 Internet Software & Services

Honeycomb is hiring a Platform Engineering professional to own managed services and infrastructure operations for customer-facing deployments across AWS and Kubernetes environments.

AWS Helm Kubernetes Microservices OpenTelemetry Serverless Terraform
6 hours, 6 minutes ago

Staff Reliability Engineer (Full Stack)

Feeld 51-250 Family Services

Feeld is hiring a Staff Reliability Engineer (Full Stack) to improve the reliability and operability of its production backend and mobile-integrated systems within a distributed Platform team.

Agile AWS CI/CD Node.js PostgreSQL React Native Redis TypeScript
6 hours, 36 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers