Airalo

Airalo is the world's first eSIM store offering travelers access to eSIMs in 200+ countries & regions at affordable prices. With Airalo, travelers can manage their eSIMs, top up on the go, and enjoy pain-free connectivity while traveling. Say goodbye t...

Airlines

Industrials

51-250 (150)

Founded 2019

$67M raised

18 open positions

Links

View All Jobs

Senior Site Reliability Engineer

2 hours, 51 minutes ago

Romania

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Agile AWS Datadog GitHub Actions Go Java Kubernetes OpenTelemetry Prometheus Python Scrum Terraform

Apply Now

Airalo

Airlines

51-250

Founded 2019

$67M raised

View All Jobs 18

Description

Lead the design of scalable, fault-tolerant, self-healing systems in a multi-region AWS environment.
Define and track SLOs and SLIs to guide architectural decisions and error budget policies.
Conduct blameless post-incident reviews to identify root causes and implement preventive measures.
Build internal tools and automation to eliminate manual operational work.
Develop and maintain automated runbooks and playbooks for operational tasks and incident response.
Improve observability by turning high-cardinality data into proactive, actionable insights.
Proactively identify and mitigate operational risks through chaos engineering and architecture reviews.
Work with software engineers early in the SDLC to design for reliability, scalability, and maintainability.
Continuously optimize system performance, capacity, and cost efficiency.
Refine the on-call experience to reduce alert fatigue, improve MTTR, and keep rotations sustainable.

Requirements

Bachelor’s degree in Computer Engineering or a similar discipline.
5+ years of experience as a Site Reliability Engineer or in a similar role.
3+ years of experience with AWS services, including strong knowledge of container orchestration.
2+ years of Kubernetes experience.
Deep understanding of observability principles and tools such as Prometheus, Datadog, and OpenTelemetry.
Experience leading incident management and complex postmortem analysis.
Experience with infrastructure as code, especially Terraform.
Experience with chaos engineering and other resilience-testing techniques.
Experience with CI/CD tools such as GitHub Actions for automated delivery.
Proficiency in at least one programming language such as Python, Go, or Java for automation and internal tooling.
Event-driven architecture experience with SNS, SQS, or similar technologies.
Ability to work independently and collaboratively in a fast-paced environment.
Strong communication skills and fluency in English.
Prior experience with Scrum or other agile methods (preferred).
Certification such as AWS Certified DevOps Engineer or Certified Kubernetes Administrator (CKA) (preferred).
Prior experience with Telco Core Networks, low-latency networking, telecommunications, eSIM, or GSMA-related technologies (preferred).
Experience with AI-driven SRE tools for anomaly detection and improvements (preferred).
Contributions to open-source SRE projects or communities (preferred).

Benefits

Fully remote work.
Generous PTO.
Wellness allowance.
Learning allowance.
Annual Airalo Away retreat.
Standby fees and overtime pay for on-call rotations.
Delayed on-call start for the first 6 months.
Guaranteed rest periods and flexible hours after night incidents.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Database Reliability Engineer

PointClickCare 1K-5K Health Care Providers & Services

PointClickCare is hiring a Senior Database Reliability Engineer to manage and improve the cloud database infrastructure behind its mission-critical SaaS platform.

United States Full-time Senior Site Reliability Engineer (SRE)

$146k-$162k

Ansible AWS Azure C# Databricks GCP Git Grafana InfluxDB JIRA MySQL PostgreSQL PowerShell Python SQL SQL Server Terraform

21 minutes ago

Apply

21 minutes ago

Site Reliability Engineer

SwissBorg 51-250 Capital Markets

SwissBorg is hiring a Site Reliability Engineer to support and scale its cloud infrastructure and operations for a fast-growing crypto investment platform.

Belgium Estonia Greece Hungary Poland Portugal Romania Full-time Mid Level Site Reliability Engineer (SRE)

Ansible Argo CD AWS CI/CD DNS Git GitLab GitOps Grafana Kafka Kubernetes OpenSearch OpenTelemetry PostgreSQL Prometheus Terraform

36 minutes ago

Apply

36 minutes ago

Staff Platform Site Reliability Specialist (Observability & Kubernetes)

Everbridge 1K-5K Internet Software & Services

Everbridge is hiring a Staff Platform Site Reliability Specialist to own and evolve its enterprise observability platform and Kubernetes environment across a large-scale cloud-native infrastructure.

Canada Full-time Senior Site Reliability Engineer (SRE)

$135k-$165k

AWS GCP Grafana Kubernetes Terraform

36 minutes ago

Apply

36 minutes ago

LiveOps Engineer

Civica 1K-5K Internet Software & Services

Civica is seeking a LiveOps Engineer to help operate and improve its cloud and production environments that support critical public services for citizens worldwide.

United Kingdom Full-time Mid Level Site Reliability Engineer (SRE)

Ansible AWS Azure Bash CI/CD Datadog DNS Docker Elasticsearch Git GitHub Actions Go Grafana Helm Jenkins Kubernetes Load Balancing PowerShell Prometheus Python Terraform

1 hour, 51 minutes ago

Apply

1 hour, 51 minutes ago

Airalo

Tags

Links

Senior Site Reliability Engineer

Airalo

Description

Requirements

Benefits

Similar Roles

Senior Database Reliability Engineer

Site Reliability Engineer

Staff Platform Site Reliability Specialist (Observability & Kubernetes)

LiveOps Engineer

You're on a roll! Sign up now to keep applying.