RapidSOS

RapidSOS

RapidSOS is an advanced emergency technology provider that connects life-saving data from various devices, apps, and sensors to emergency responders, enhancing response times and improving outcomes in critical situations.

Diversified Telecommunication Services
51-250
Founded 2013
$281M raised

Description

  • Own performance and reliability outcomes for services operating at scale, including the application-level choices that affect system behavior.
  • Design and implement resilience improvements such as safer deployment patterns, failover strategies, and redundancy.
  • Instrument services with structured logging, metrics, and alerting to improve observability and debugging.
  • Take production incidents from first signal through root cause analysis and resolution, including fixes that strengthen long-term stability.
  • Work across infrastructure-as-code, container orchestration, CI/CD pipelines, and service-level application code to resolve issues end to end.
  • Collaborate with engineering teams to improve reliability and performance across the systems they own.
  • Investigate issues across infrastructure and application layers to identify and fix problems at the source.
  • Help shape the organization’s reliability practices by improving visibility, resilience, and operational readiness.

Requirements

  • 5+ years of professional engineering experience with deep expertise in Python.
  • Experience with AWS infrastructure, including networking, managed databases, IAM, DNS-based routing, failover, and traffic-routing cost implications.
  • Hands-on Kubernetes experience with containerized workloads in production across EKS, ECS, or Fargate.
  • Strong understanding of distributed systems failure modes, including resource exhaustion, replication lag, and queue backpressure.
  • Experience operating high-throughput messaging systems such as RabbitMQ, Kafka, AWS SNS, or SQS.
  • Experience with infrastructure-as-code tools such as Terraform and CI/CD pipelines, with a focus on reliability and scalability.
  • Experience building or improving observability through logging, metrics, and alerting.
  • Demonstrable experience using AI to safely and securely improve velocity, reliability, and recoverability of services.
  • Strong communication and interpersonal skills, with the ability to collaborate effectively as a team player.
  • Strong proficiency in coding best practices and the ability to write clean, maintainable, and testable code.
  • Demonstrated problem-solving ability across both infrastructure and application layers.
  • Ability and willingness to collaborate in person a few times per quarter, or as needed.
  • Preferred: experience supporting production systems in an on-call or similar reliability-focused capacity.
  • Preferred: experience with Datadog, Elasticsearch/OpenSearch, ArgoCD-based GitOps deployments, and modernizing legacy CI/CD tools such as Concourse or Jenkins.

Benefits

  • Salary range of $160,000 to $195,000, depending on experience, skills, education, location, and business needs.
  • Equity options / equity participation.
  • Competitive salary and benefits package.
  • Flexible, dynamic, and fun startup work environment.
  • Opportunity to work with a passionate, highly talented team on a mission-driven problem.
  • Remote-friendly role (#LI-Remote).
  • Equal opportunity workplace with inclusive hiring practices.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Center Reliability Engineer

Phaidra 51-250 Internet Software & Services

Phaidra is hiring a Data Center Reliability Engineer to translate data center telemetry into operational intelligence for its AI-powered monitoring and control systems.

GitLab LLM Machine Learning NumPy Pandas Python Reinforcement Learning
12 hours, 51 minutes ago

Senior Site Reliability Engineer

Accenture 100K+ Professional Services

Accenture Federal Services is hiring a Site Reliability Engineer to improve the reliability, performance, and scalability of a client system supporting US federal mission operations.

13 hours, 20 minutes ago

Senior Site Reliability Engineer (Remote Build)

Remote 251-1K Professional Services

Remote is hiring a Senior Site Reliability Engineer for Remote Build to own the reliability, security, and operational strategy behind its global employment infrastructure platform.

AWS Bash CI/CD Datadog Elixir GitHub Actions GitLab Go Grafana Java Jenkins Kubernetes Linux Microservices Node.js Prometheus Python Terraform
1 day, 12 hours ago

Senior Site Reliability Engineer (Remote Build)

Remote 251-1K Professional Services

Remote is hiring a Senior Site Reliability Engineer to own the reliability, security, and operational strategy for Remote Build’s global infrastructure platform supporting AI-driven HR and Finance integrations.

AWS Bash CI/CD Datadog Elixir GitHub Actions GitLab Go Grafana Java Jenkins Kubernetes Linux Microservices Node.js Prometheus Python Terraform
1 day, 13 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers