Remote

Remote

Global HR Solutions & Employment Tools for Distributed Teams | Remote Hire international talent in minutes. Remote is the most disruptive global payroll, tax, HR and compliance solution for distributed teams. The easier way to employ internationally 🌍....

Professional Services
251-1K
Founded 2019
$496M raised

Description

  • Design, implement, and maintain infrastructure-as-code patterns using Terraform and Kubernetes.
  • Build and maintain monitoring, logging, and alerting systems for production services.
  • Lead incident response, conduct post-mortems, and drive reliability improvements.
  • Work with the Security team to embed security and compliance into Build infrastructure.
  • Continuously optimize system performance, resource utilization, and cloud costs.
  • Eliminate manual operational toil through automation, tools, and improved processes.
  • Partner with platform teams to improve API, MCP, and CLI resilience and observability.
  • Provide infrastructure feedback that helps shape platform evolution and developer experience.

Requirements

  • Senior-level experience in Site Reliability Engineering, DevOps Engineering, or SysOps roles.
  • Experience standing up and operating production systems at scale.
  • Deep hands-on experience running Kubernetes in production.
  • Solid AWS fundamentals across compute, networking, storage, and managed services.
  • Proficiency with Terraform or similar infrastructure-as-code tools.
  • Experience with CI/CD and deployment automation tools such as GitLab, GitHub Actions, or Jenkins.
  • Strong bash scripting and comfort debugging system-level issues and logs.
  • Understanding of Linux kernel basics.
  • Ability to communicate complex infrastructure decisions clearly to technical and non-technical stakeholders.
  • Experience with at least one backend programming language such as Elixir, Python, Go, Java, or Node.js is a plus.
  • Experience in consultancy settings is a plus.
  • Experience with container registries and artifact management such as ECR or Docker Hub is a plus.
  • Experience with observability tools such as Datadog, Prometheus, ELK, or Grafana is a plus.
  • Experience building or scaling multi-tenant platforms is a plus.
  • Application materials must be submitted in English, and a PDF CV or LinkedIn profile is required.

Benefits

  • Annual salary range of $54,000 to $150,000 USD.
  • Fair, unbiased compensation with equity pay.
  • Stock options.
  • Work from anywhere.
  • Flexible paid time off.
  • Flexible working hours in an async environment.
  • 16 weeks of paid parental leave.
  • Mental health support services.
  • Learning budget.
  • Home office budget and IT equipment.
  • Budget for local in-person social events or co-working spaces.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer (Remote Build)

Remote 251-1K Professional Services

Remote is hiring a Senior Site Reliability Engineer for Remote Build to own the reliability, security, and operational strategy behind its global employment infrastructure platform.

AWS Bash CI/CD Datadog Elixir GitHub Actions GitLab Go Grafana Java Jenkins Kubernetes Linux Microservices Node.js Prometheus Python Terraform
4 hours, 49 minutes ago

Sr. Site Reliability Engineer III (6448)

MetroStar 251-1K IT Services

MetroStar is hiring a Sr. Site Reliability Engineer III to support mission-critical federal government workloads and developer tooling in a highly secure, operational environment.

Ansible AWS Bash CI/CD Kubernetes Load Balancing
1 day, 5 hours ago

NoSQL Database Engineer II

LivePerson 1K-5K Internet Software & Services

LivePerson is hiring a NoSQL Database Engineer (L2) in India to support production reliability and platform engineering for large-scale NoSQL systems and cloud infrastructure.

Bash Cassandra Couchbase GCP Go Grafana Prometheus Python Redis Terraform
2 days, 5 hours ago

Sr. Production Engineer, Solutions Engineering

Pinterest 5K-10K Internet Software & Services

Pinterest is hiring a Senior Production Engineer on Solutions Engineering to design AI-driven reliability and automation systems that improve the operation of large-scale distributed infrastructure serving hundreds of millions of users.

Ansible AWS Azure Chef Docker Envoy GCP Go Hadoop Kafka Kubernetes Linux MySQL Puppet Python Terraform Unix
2 days, 5 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers