RapidSOS

RapidSOS is an advanced emergency technology provider that connects life-saving data from various devices, apps, and sensors to emergency responders, enhancing response times and improving outcomes in critical situations.

Diversified Telecommunication Services

Telecommunication Services

51-250 (150)

Founded 2013

$281M raised

7 open positions

Links

View All Jobs

Site Reliability Engineering Manager

3 weeks, 4 days ago

United States

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Argo CD AWS Datadog GitHub Actions Helm Kubernetes Python RabbitMQ Terraform

Apply Now

RapidSOS

Diversified Telecommunication Services

51-250

Founded 2013

$281M raised

View All Jobs 7

Description

Own the reliability, scalability, and operational health of Kubernetes clusters, shared services, and core AWS infrastructure.
Drive infrastructure-as-code standards using Terraform and Atlantis.
Partner with engineering managers to define SLOs, error budgets, and service ownership practices.
Lead proactive reliability work, including capacity planning, failure mode analysis, runbook quality, chaos engineering, and reliability reviews.
Drive blameless postmortems and ensure incidents result in systemic improvements with clear ownership.
Run the Tier 1 on-call rotation and coordinate with the third-party NOC.
Lead incident command for Sev-1 incidents and keep engineering leadership informed.
Mentor and grow the SRE Operations team, including headcount planning as the function expands.
Shape the long-term AI strategy for infrastructure and operations through automation and workflow improvements.
Own AWS cost management, reserved instance strategy, and reporting on reliability metrics to leadership.
Collaborate with Platform SRE on major infrastructure initiatives such as Gateway API adoption, cross-region architecture, and security changes.

Requirements

7+ years of experience in SRE, platform engineering, or DevOps.
At least 2 years of experience managing a team.
Direct ownership of Kubernetes and AWS infrastructure in production environments where uptime and resilience are critical.
Experience shifting teams from reactive operations to engineering-first reliability practices.
Experience partnering with engineering teams to improve reliability, scalability, and operational readiness before production issues occur.
Ability to write Python and review production-quality scripts and tooling.
Hands-on experience with SLOs, error budgets, and blameless postmortems.
Hands-on familiarity with Terraform/Atlantis, Kubernetes/Helm/ArgoCD, Datadog, Concourse CI/GitHub Actions, RabbitMQ, and AWS services including EKS, RDS/Aurora, ElastiCache, VPC networking, IAM, KMS, and Route53.
Experience with AI-driven automation or operations tooling is preferred.
Experience in mission-critical or high-availability environments is preferred.

Benefits

Competitive salary of $185,000 to $215,000.
Equity options.
Competitive salary and benefits package.
Flexible remote work environment (#LI-Remote).
Dynamic, fun startup environment with a highly talented team.
Opportunity to work on a mission-driven product with global impact.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Center Reliability Engineer

Phaidra 51-250 Internet Software & Services

Phaidra is hiring a Data Center Reliability Engineer to translate data center telemetry into operational intelligence for its AI-powered monitoring and control systems.

United States Full-time Junior Site Reliability Engineer (SRE)

$101k-$164k

GitLab LLM Machine Learning NumPy Pandas Python Reinforcement Learning

12 hours, 50 minutes ago

Apply

12 hours, 50 minutes ago

Senior Site Reliability Engineer

Accenture 100K+ Professional Services

Accenture Federal Services is hiring a Site Reliability Engineer to improve the reliability, performance, and scalability of a client system supporting US federal mission operations.

United States Full-time Senior Site Reliability Engineer (SRE)

$106k-$221k

13 hours, 20 minutes ago

Apply

13 hours, 20 minutes ago

Senior Site Reliability Engineer (Remote Build)

Remote 251-1K Professional Services

Remote is hiring a Senior Site Reliability Engineer for Remote Build to own the reliability, security, and operational strategy behind its global employment infrastructure platform.

Middle East Africa Europe Full-time Senior Site Reliability Engineer (SRE)

$54k-$150k

AWS Bash CI/CD Datadog Elixir GitHub Actions GitLab Go Grafana Java Jenkins Kubernetes Linux Microservices Node.js Prometheus Python Terraform

1 day, 12 hours ago

Apply

1 day, 12 hours ago

Senior Site Reliability Engineer (Remote Build)

Remote 251-1K Professional Services

Remote is hiring a Senior Site Reliability Engineer to own the reliability, security, and operational strategy for Remote Build’s global infrastructure platform supporting AI-driven HR and Finance integrations.

Anywhere Full-time Senior Site Reliability Engineer (SRE)

$54k-$150k

AWS Bash CI/CD Datadog Elixir GitHub Actions GitLab Go Grafana Java Jenkins Kubernetes Linux Microservices Node.js Prometheus Python Terraform

1 day, 13 hours ago

Apply

1 day, 13 hours ago

RapidSOS

Tags

Links

Site Reliability Engineering Manager

RapidSOS

Description

Requirements

Benefits

Similar Roles

Data Center Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer (Remote Build)

Senior Site Reliability Engineer (Remote Build)

You're on a roll! Sign up now to keep applying.