RapidSOS

RapidSOS is an advanced emergency technology provider that connects life-saving data from various devices, apps, and sensors to emergency responders, enhancing response times and improving outcomes in critical situations.

Diversified Telecommunication Services

Telecommunication Services

51-250 (150)

Founded 2013

$281M raised

7 open positions

Links

View All Jobs

Senior Site Reliability Engineer

1 month, 3 weeks ago

United States

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Argo CD AWS CI/CD Datadog DNS Elasticsearch GitOps Jenkins Kafka Kubernetes OpenSearch Python RabbitMQ Terraform

Apply Now

RapidSOS

Diversified Telecommunication Services

51-250

Founded 2013

$281M raised

View All Jobs 7

Description

Own performance and reliability outcomes for services operating at scale, including the application-level choices that affect system behavior.
Design and implement resilience improvements such as safer deployment patterns, failover strategies, and redundancy.
Instrument services with structured logging, metrics, and alerting to improve observability and debugging.
Take production incidents from first signal through root cause analysis and resolution, including fixes that strengthen long-term stability.
Work across infrastructure-as-code, container orchestration, CI/CD pipelines, and service-level application code to resolve issues end to end.
Collaborate with engineering teams to improve reliability and performance across the systems they own.
Investigate issues across infrastructure and application layers to identify and fix problems at the source.
Help shape the organization’s reliability practices by improving visibility, resilience, and operational readiness.

Requirements

5+ years of professional engineering experience with deep expertise in Python.
Experience with AWS infrastructure, including networking, managed databases, IAM, DNS-based routing, failover, and traffic-routing cost implications.
Hands-on Kubernetes experience with containerized workloads in production across EKS, ECS, or Fargate.
Strong understanding of distributed systems failure modes, including resource exhaustion, replication lag, and queue backpressure.
Experience operating high-throughput messaging systems such as RabbitMQ, Kafka, AWS SNS, or SQS.
Experience with infrastructure-as-code tools such as Terraform and CI/CD pipelines, with a focus on reliability and scalability.
Experience building or improving observability through logging, metrics, and alerting.
Demonstrable experience using AI to safely and securely improve velocity, reliability, and recoverability of services.
Strong communication and interpersonal skills, with the ability to collaborate effectively as a team player.
Strong proficiency in coding best practices and the ability to write clean, maintainable, and testable code.
Demonstrated problem-solving ability across both infrastructure and application layers.
Ability and willingness to collaborate in person a few times per quarter, or as needed.
Preferred: experience supporting production systems in an on-call or similar reliability-focused capacity.
Preferred: experience with Datadog, Elasticsearch/OpenSearch, ArgoCD-based GitOps deployments, and modernizing legacy CI/CD tools such as Concourse or Jenkins.

Benefits

Salary range of $160,000 to $195,000, depending on experience, skills, education, location, and business needs.
Equity options / equity participation.
Competitive salary and benefits package.
Flexible, dynamic, and fun startup work environment.
Opportunity to work with a passionate, highly talented team on a mission-driven problem.
Remote-friendly role (#LI-Remote).
Equal opportunity workplace with inclusive hiring practices.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Center Reliability Engineer

Phaidra 51-250 Internet Software & Services

Phaidra is hiring a Data Center Reliability Engineer to translate data center telemetry into operational intelligence for its AI-powered monitoring and control systems.

United States Full-time Junior Site Reliability Engineer (SRE)

$101k-$164k

GitLab LLM Machine Learning NumPy Pandas Python Reinforcement Learning

12 hours, 51 minutes ago

Apply

12 hours, 51 minutes ago

Senior Site Reliability Engineer

Accenture 100K+ Professional Services

Accenture Federal Services is hiring a Site Reliability Engineer to improve the reliability, performance, and scalability of a client system supporting US federal mission operations.

United States Full-time Senior Site Reliability Engineer (SRE)

$106k-$221k

13 hours, 20 minutes ago

Apply

13 hours, 20 minutes ago

Senior Site Reliability Engineer (Remote Build)

Remote 251-1K Professional Services

Remote is hiring a Senior Site Reliability Engineer for Remote Build to own the reliability, security, and operational strategy behind its global employment infrastructure platform.

Middle East Africa Europe Full-time Senior Site Reliability Engineer (SRE)

$54k-$150k

AWS Bash CI/CD Datadog Elixir GitHub Actions GitLab Go Grafana Java Jenkins Kubernetes Linux Microservices Node.js Prometheus Python Terraform

1 day, 12 hours ago

Apply

1 day, 12 hours ago

Senior Site Reliability Engineer (Remote Build)

Remote 251-1K Professional Services

Remote is hiring a Senior Site Reliability Engineer to own the reliability, security, and operational strategy for Remote Build’s global infrastructure platform supporting AI-driven HR and Finance integrations.

Anywhere Full-time Senior Site Reliability Engineer (SRE)

$54k-$150k

AWS Bash CI/CD Datadog Elixir GitHub Actions GitLab Go Grafana Java Jenkins Kubernetes Linux Microservices Node.js Prometheus Python Terraform

1 day, 13 hours ago

Apply

1 day, 13 hours ago

RapidSOS

Tags

Links

Senior Site Reliability Engineer

RapidSOS

Description

Requirements

Benefits

Similar Roles

Data Center Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer (Remote Build)

Senior Site Reliability Engineer (Remote Build)

You're on a roll! Sign up now to keep applying.