RapidSOS

RapidSOS

RapidSOS is an advanced emergency technology provider that connects life-saving data from various devices, apps, and sensors to emergency responders, enhancing response times and improving outcomes in critical situations.

Diversified Telecommunication Services
51-250
Founded 2013
$281M raised

Description

  • Own performance and reliability outcomes for services operating at scale, including the application-level choices that affect system behavior.
  • Design and implement resilience improvements such as safer deployment patterns, failover strategies, and redundancy.
  • Instrument services with structured logging, metrics, and alerting to improve observability and debugging.
  • Take production incidents from first signal through root cause analysis and resolution, including fixes that strengthen long-term stability.
  • Work across infrastructure-as-code, container orchestration, CI/CD pipelines, and service-level application code to resolve issues end to end.
  • Collaborate with engineering teams to improve reliability and performance across the systems they own.
  • Investigate issues across infrastructure and application layers to identify and fix problems at the source.
  • Help shape the organization’s reliability practices by improving visibility, resilience, and operational readiness.

Requirements

  • 5+ years of professional engineering experience with deep expertise in Python.
  • Experience with AWS infrastructure, including networking, managed databases, IAM, DNS-based routing, failover, and traffic-routing cost implications.
  • Hands-on Kubernetes experience with containerized workloads in production across EKS, ECS, or Fargate.
  • Strong understanding of distributed systems failure modes, including resource exhaustion, replication lag, and queue backpressure.
  • Experience operating high-throughput messaging systems such as RabbitMQ, Kafka, AWS SNS, or SQS.
  • Experience with infrastructure-as-code tools such as Terraform and CI/CD pipelines, with a focus on reliability and scalability.
  • Experience building or improving observability through logging, metrics, and alerting.
  • Demonstrable experience using AI to safely and securely improve velocity, reliability, and recoverability of services.
  • Strong communication and interpersonal skills, with the ability to collaborate effectively as a team player.
  • Strong proficiency in coding best practices and the ability to write clean, maintainable, and testable code.
  • Demonstrated problem-solving ability across both infrastructure and application layers.
  • Ability and willingness to collaborate in person a few times per quarter, or as needed.
  • Preferred: experience supporting production systems in an on-call or similar reliability-focused capacity.
  • Preferred: experience with Datadog, Elasticsearch/OpenSearch, ArgoCD-based GitOps deployments, and modernizing legacy CI/CD tools such as Concourse or Jenkins.

Benefits

  • Salary range of $160,000 to $195,000, depending on experience, skills, education, location, and business needs.
  • Equity options / equity participation.
  • Competitive salary and benefits package.
  • Flexible, dynamic, and fun startup work environment.
  • Opportunity to work with a passionate, highly talented team on a mission-driven problem.
  • Remote-friendly role (#LI-Remote).
  • Equal opportunity workplace with inclusive hiring practices.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Recorded Future 251-1K Professional Services

Recorded Future is hiring a Site Reliability Engineer to strengthen the reliability, scalability, and performance of its critical cloud systems in close partnership with engineering teams.

AWS Chef Elasticsearch ELK Stack Grafana Kafka Kibana Kubernetes Linux Logstash Microservices MongoDB OpenTelemetry Prometheus RabbitMQ Terraform
3 hours ago

Senior Site Reliability Engineer (Remote - Brazil)

Loadsmart 251-1K Air Freight & Logistics

Loadsmart is hiring a Senior Site Reliability Engineer in Brazil to build and maintain its internal platform and ensure the reliability, safety, and operational excellence of critical engineering systems.

Ansible AWS Bash Chef CI/CD Docker Kubernetes PostgreSQL Python Terraform
3 hours ago

Site Reliability Engineer

Alpaca 51-250 Capital Markets

Alpaca is hiring a Site Reliability Engineer to keep its brokerage platform reliable and operable across cloud, Kubernetes, observability, messaging, and database systems, with a strong focus on PostgreSQL reliability on the trading-critical path.

DNS GitOps Go Kafka Kubernetes Linux Load Balancing PostgreSQL Python RabbitMQ Secrets Management TLS
6 hours, 20 minutes ago

Site Reliability Engineer

Kaseya 1K-5K IT Services

Kaseya is hiring a Site Reliability Engineer to own the reliability, automation, and production stability of its AWS-based services used by thousands of MSPs worldwide.

Ansible AWS Chef CloudFormation Datadog DevSecOps Elasticsearch Kibana Kubernetes MySQL PostgreSQL Puppet Secrets Management Serverless Terraform
10 hours, 20 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers