Airalo

Airalo is the world's first eSIM store offering travelers access to eSIMs in 200+ countries & regions at affordable prices. With Airalo, travelers can manage their eSIMs, top up on the go, and enjoy pain-free connectivity while traveling. Say goodbye t...

Airlines

Industrials

51-250 (150)

Founded 2019

$67M raised

22 open positions

Links

View All Jobs

Senior Site Reliability Engineer

19 hours, 2 minutes ago

United Kingdom, Spain

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Agile AWS CI/CD Datadog GitHub Actions Go Java Kubernetes Microservices OpenTelemetry Prometheus Python Scrum Terraform

Apply Now

Airalo

Airlines

51-250

Founded 2019

$67M raised

View All Jobs 22

Description

Lead the design of scalable, fault-tolerant, and self-healing systems in a multi-region AWS environment.
Define and track service level objectives (SLOs) and service level indicators (SLIs) to inform reliability and error budget decisions.
Conduct blameless post-incident reviews, identify systemic root causes, and implement long-term preventive measures.
Develop internal tools and automation to eliminate recurring manual operational work.
Create and maintain automated runbooks and playbooks for operational tasks and incident response.
Improve observability by turning monitoring into actionable insights using high-cardinality data.
Proactively identify and mitigate operational risk through chaos engineering and architecture reviews.
Partner with software engineers early in the SDLC to design for reliability, scalability, and maintainability.
Continuously evaluate and optimize system performance, capacity, and cost efficiency.
Improve the on-call experience by reducing alert fatigue, lowering MTTR, and supporting sustainable rotation health.

Requirements

Bachelor’s degree in Computer Engineering or a similar discipline.
5+ years of experience as a Site Reliability Engineer or in a similar role.
3+ years of experience with AWS services, including strong knowledge of container orchestration.
2+ years of Kubernetes experience.
Deep understanding of observability principles and tools such as Prometheus, Datadog, and OpenTelemetry.
Experience leading incident management and complex postmortem analysis.
Experience with infrastructure as code, preferably Terraform.
Experience with chaos engineering and other resilience-testing techniques.
Experience with CI/CD tools such as GitHub Actions for automated delivery.
Proficiency in at least one programming language such as Python, Go, or Java for automation and internal tooling.
Experience with event-driven architecture such as SNS and SQS.
Ability to work independently and collaboratively in a fast-paced environment.
Strong communication skills and fluency in English.
Experience with Scrum or other agile methods (preferred).
Certification such as AWS Certified DevOps Engineer or Certified Kubernetes Administrator (CKA) (preferred).
Experience with Telco Core Networks, low-latency networking, telecommunications, eSIM, or GSMA technologies (preferred).
Experience with AI-driven SRE tools for anomaly detection and improvements (preferred).
Contributions to open-source SRE projects or communities (preferred).

Benefits

Remote-first work environment with the option to work from anywhere.
Health insurance.
Work-from-anywhere stipend.
Annual wellness and learning credits.
Annual all-expenses-paid company retreat in a destination location.
Paid on-call rotation with standby fees and overtime pay.
No on-call duties during the first 6 months.
Guaranteed rest periods and flexible hours after night incidents.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Site Reliability Engineer

Dropbox 1K-5K Internet Software & Services

Dropbox is hiring a Corporate Site Reliability Engineer to shape infrastructure strategy for IT services by improving the reliability, scalability, security, and observability of critical systems.

Mexico Full-time Senior Site Reliability Engineer (SRE)

Ansible AWS Bash Chef Datadog DHCP DNS Docker EC2 Git GitHub GitHub Actions GitOps Kubernetes Linux Python REST API RHEL Serverless Terraform Ubuntu WAF

17 minutes ago

Apply

17 minutes ago

Director, Software Engineering (Site Reliability Engineering)

Affirm 1K-5K Diversified Financial Services

Affirm is seeking a Director of Site Reliability Engineering to lead reliability, availability, and operational excellence for its global platform and core services.

United States Full-time Executive Director of Engineering Site Reliability Engineer (SRE)

$267k-$360k

1 hour, 17 minutes ago

Apply

1 hour, 17 minutes ago