Remote

Remote

Global HR Solutions & Employment Tools for Distributed Teams | Remote Hire international talent in minutes. Remote is the most disruptive global payroll, tax, HR and compliance solution for distributed teams. The easier way to employ internationally 🌍....

Professional Services
251-1K
Founded 2019
$496M raised

Description

  • Design, implement, and maintain infrastructure-as-code patterns using Terraform and Kubernetes for standard connectors and custom builds.
  • Build and maintain monitoring, logging, and alerting systems to support observability.
  • Lead incident response efforts, conduct post-mortems, and drive reliability improvements.
  • Work with the Security team to embed security into the Build infrastructure and support compliance across 100+ jurisdictions.
  • Continuously optimize system performance, resource utilization, and cloud costs.
  • Identify and eliminate manual operational toil through automation and improved processes.
  • Partner with platform teams to ensure APIs, MCP, and CLI are resilient and observable.
  • Provide infrastructure feedback that helps shape platform evolution and developer experience.

Requirements

  • Senior-level experience in Site Reliability Engineering, DevOps Engineering, or SysOps roles.
  • Experience standing up and operating production systems at scale.
  • Deep hands-on experience running Kubernetes in production.
  • Solid AWS fundamentals across compute, networking, storage, and managed services.
  • Proficiency with Terraform or similar infrastructure-as-code tools.
  • Experience with CI/CD and deployment automation tools such as GitLab, GitHub Actions, or Jenkins.
  • Strong bash scripting skills.
  • Comfort debugging system-level issues, reading logs, and understanding Linux kernel basics.
  • Ability to communicate complex infrastructure decisions clearly to technical and non-technical stakeholders.
  • Experience with at least one backend programming language such as Elixir, Python, Go, Java, or Node.js (preferred).
  • Experience in consultancy settings (preferred).
  • Experience with container registries and artifact management such as ECR or Docker Hub (preferred).
  • Experience with observability tools such as Datadog, Prometheus, ELK, or Grafana (preferred).
  • Experience working with or scaling multi-tenant platforms (preferred).

Benefits

  • Annual salary range of $54,000 to $150,000 USD.
  • Work from anywhere with fully remote employment.
  • Flexible paid time off.
  • Flexible working hours with an async work culture.
  • 16 weeks of paid parental leave.
  • Mental health support services.
  • Stock options.
  • Learning budget.
  • Home office budget and IT equipment.
  • Budget for local in-person social events or co-working spaces.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer (Remote Build)

Remote 251-1K Professional Services

Remote is hiring a Senior Site Reliability Engineer to own the reliability, security, and operational strategy for Remote Build’s global infrastructure platform supporting AI-driven HR and Finance integrations.

AWS Bash CI/CD Datadog Elixir GitHub Actions GitLab Go Grafana Java Jenkins Kubernetes Linux Microservices Node.js Prometheus Python Terraform
5 hours, 50 minutes ago

Sr. Site Reliability Engineer III (6448)

MetroStar 251-1K IT Services

MetroStar is hiring a Sr. Site Reliability Engineer III to support mission-critical federal government workloads and developer tooling in a highly secure, operational environment.

Ansible AWS Bash CI/CD Kubernetes Load Balancing
1 day, 5 hours ago

NoSQL Database Engineer II

LivePerson 1K-5K Internet Software & Services

LivePerson is hiring a NoSQL Database Engineer (L2) in India to support production reliability and platform engineering for large-scale NoSQL systems and cloud infrastructure.

Bash Cassandra Couchbase GCP Go Grafana Prometheus Python Redis Terraform
2 days, 5 hours ago

Sr. Production Engineer, Solutions Engineering

Pinterest 5K-10K Internet Software & Services

Pinterest is hiring a Senior Production Engineer on Solutions Engineering to design AI-driven reliability and automation systems that improve the operation of large-scale distributed infrastructure serving hundreds of millions of users.

Ansible AWS Azure Chef Docker Envoy GCP Go Hadoop Kafka Kubernetes Linux MySQL Puppet Python Terraform Unix
2 days, 5 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers