Offchain Labs

Offchain Labs

Offchain Labs is a NY-based company creating cutting-edge scaling solutions for Ethereum with Arbitrum Rollup, instantly scaling Dapps and increasing capacity without sacrificing security.

Internet Software & Services
11-50
Founded 2018
$124M raised

Description

  • Operate production Kubernetes clusters and manage the supporting platform services.
  • Build scalable, declarative infrastructure using Terraform or similar tooling.
  • Design and maintain CI/CD workflows for both infrastructure and application deployments.
  • Develop GitOps-style delivery processes using tools such as ArgoCD and ApplicationSets.
  • Troubleshoot production incidents, low-level system issues, and on-call escalations under pressure.
  • Drive postmortems and reliability improvements after incidents.
  • Design and operate observability systems using metrics, logs, and dashboards.
  • Diagnose networking and storage issues across complex distributed systems.
  • Implement secure-by-default infrastructure and participate in architecture reviews and threat modeling.
  • Automate operational workflows using scripting or programming in Python, Go, or Bash.

Requirements

  • Experience operating production Kubernetes clusters and maintaining Kubernetes environments.
  • Experience building declarative infrastructure with Terraform or similar tools.
  • Experience designing CI/CD workflows with ArgoCD, GitHub Actions, CodeBuild, or similar tools.
  • Experience with observability tooling such as Prometheus, Loki, Mimir, Grafana, and CloudWatch.
  • Experience troubleshooting networking and storage issues in distributed systems.
  • Experience implementing secure infrastructure, least-privilege access, and threat modeling.
  • Experience automating operational workflows with Python, Go, or Bash.
  • Comfort working in Linux and using shell scripting.
  • Experience with cloud platforms such as AWS, GCP, or Azure and understanding of their underlying components.
  • Experience participating in on-call rotations and incident response is required; familiarity with GitOps-style systems, ArgoCD ApplicationSets, and cloud portability is preferred.

Benefits

  • Remote-first global workforce with a New York office.
  • Annual company offsite and team onsites.
  • Professional reimbursement program for conferences, certifications, and other learning opportunities.
  • Medical, dental, and vision coverage for the U.S. and some other countries.
  • 401(k) retirement plan with company match for U.S. employees.
  • Wellness stipend.
  • Home office setup and ergonomic equipment program.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer (DevTools)

Nebius 51-250 Internet Software & Services

Nebius is hiring an SRE for its DevTools team to maintain and improve large-scale developer infrastructure that supports builds, artifacts, and version control workflows for its AI cloud platform.

CI/CD GitLab Go Java Kotlin Python Ruby Spring TeamCity
24 minutes ago

Senior Site Reliability Engineer (SRE)

The Investigo Group Professional Services

The Investigo Group is hiring a Senior Site Reliability Engineer to operate and mature its production Kubernetes and OpenShift platforms across secure on-premises and hybrid environments.

Ansible Argo CD CI/CD Flux GitHub Actions GitOps Go Grafana Helm Juniper Kubernetes Linux Load Balancing Machine Learning OpenID Connect OpenShift OpenTelemetry Palo Alto Prometheus Python SAML Shell Scripting Terraform
5 hours, 20 minutes ago

Staff Site Reliability Engineer, Production Engineering

Dropbox 1K-5K Internet Software & Services

Dropbox is hiring a Site Reliability Engineer to define and drive company-wide reliability strategy for an AI-enabled engineering environment, with the goal of strengthening stability, observability, incident response, and operational excellence at scale.

5 hours, 28 minutes ago

Senior Cloud Resilience Architect

Blink Health 251-1K Health Care Providers & Services

Blink Health is hiring a disaster recovery and resilience architecture leader to strengthen the reliability of its healthcare technology platforms and critical patient-facing systems.

Ansible AWS Azure CloudFormation DNS GCP Kubernetes Load Balancing Pulumi Terraform
5 hours, 41 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers