Akuity

Akuity

Akuity is the enterprise company for Argo CD, providing expert support, industry-leading Kubernetes application delivery software, and enabling GitOps within organizations.

Professional Services
11-50
Founded 2021
$24M raised

Description

  • Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement against them.
  • Design, instrument, and maintain observability systems across multi-region AWS infrastructure.
  • Identify reliability gaps, lead blameless post-mortems, and implement permanent fixes.
  • Partner with engineering teams to build reliability into new features before production release.
  • Participate in an on-call rotation and serve as incident commander for high-severity production events.
  • Build and maintain runbooks, escalation paths, and incident playbooks to reduce time to resolution.
  • Improve alerting quality by reducing noise, increasing signal, and eliminating toil.
  • Lead post-incident reviews with clear timelines, root cause analysis, and follow-through on action items.

Requirements

  • 5+ years of SRE, platform engineering, or production operations experience in a SaaS environment.
  • Deep hands-on Kubernetes expertise, including scheduler, networking, storage, and autoscaling.
  • Strong AWS fundamentals across EC2, EKS, VPC, NLB, Route53, S3, RDS, and IAM.
  • Experience defining and operating against SLOs in production, including writing error budgets.
  • Proficiency with observability tooling such as Prometheus, Grafana, OpenTelemetry, Datadog, or equivalent.
  • Strong scripting and automation skills in Go, Python, Bash, or similar.
  • Strong written communication skills for runbooks, incident reports, and post-mortems.
  • Must live within US time zones from Pacific through Eastern, including Canada and other regions.
  • Experience with Argo CD, Kargo, or GitOps-based delivery workflows is a strong advantage.
  • Familiarity with multi-region, multi-cluster Kubernetes deployments is a strong advantage.
  • Experience with compliance-adjacent infrastructure such as SOC 2, ISO 27001, HIPAA, or PCI DSS is a strong advantage.
  • Background operating infrastructure for platform or developer tooling companies is a strong advantage.

Benefits

  • Competitive compensation commensurate with experience.
  • Equity participation in a well-funded, growing company.
  • Fully remote work from anywhere within US time zones, including Canada and other regions.
  • Home office stipend and equipment budget.
  • Flexible time off with a culture that respects it.
  • Full benefits for US-based employees, including comprehensive health, dental, and vision coverage.
  • Opportunity to work directly with the engineers who built Argo CD and Kargo.
  • Candidates based outside the US will be engaged as contractors.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Obsidian Security 51-250 Internet Software & Services

Obsidian Security is hiring a DevOps/SRE engineer to support and improve the reliability, scalability, and operational performance of its customer-facing SaaS security platform.

Argo CD AWS CI/CD Dagster Databricks Elasticsearch GCP Go Grafana Helm Kafka Kong Kubernetes Microservices PostgreSQL Prometheus Python
5 hours, 2 minutes ago

Observability Architect

Geotab 1K-5K Road & Rail

Geotab is hiring an SRE Observability Architect to define and lead the observability architecture for its cloud platforms, with the goal of delivering scalable, cost-efficient, and highly reliable insight across distributed systems.

Elasticsearch GCP Go Grafana Helm Jaeger Kubernetes OpenTelemetry Prometheus Python Terraform
1 day, 4 hours ago

Senior Site Reliability Engineer (SRE)

Sleek 251-1K Professional Services

Sleek is hiring a Senior SRE Engineer to architect and scale its cloud and AI-ready infrastructure across a multi-country, fast-growing platform serving micro SMEs.

API Gateway Argo CD AWS Azure CI/CD Cloudflare CloudFormation Flux GCP GitOps Kong Kubernetes Microservices NestJS Node.js OpenSearch OpenTelemetry Prometheus Pulumi Python Secrets Management Serverless Terraform Traefik WAF
1 day, 4 hours ago

[Job 30278] SRE (DevOps)

CI&T 5K-10K Internet Software & Services

CI&T is hiring a senior SRE/DevOps to evolve the infrastructure behind critical digital products, with a focus on resilient multi-region AWS architecture and mobile delivery pipelines.

Android Ansible API Gateway AWS Bash CI/CD DynamoDB GitHub Actions GitLab CI Grafana iOS Jenkins Kubernetes Prometheus Python Secrets Management Terraform
1 day, 4 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers