Airalo

Airalo is the world's first eSIM store offering travelers access to eSIMs in 200+ countries & regions at affordable prices. With Airalo, travelers can manage their eSIMs, top up on the go, and enjoy pain-free connectivity while traveling. Say goodbye t...

Airlines

Industrials

51-250 (150)

Founded 2019

$67M raised

18 open positions

Links

View All Jobs

Senior Site Reliability Engineer

3 weeks ago

Romania

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Agile AWS Datadog GitHub Actions Go Java Kubernetes OpenTelemetry Prometheus Python Scrum Terraform

Apply Now

Airalo

Airlines

51-250

Founded 2019

$67M raised

View All Jobs 18

Description

Lead the design of scalable, fault-tolerant, self-healing systems in a multi-region AWS environment.
Define and track SLOs and SLIs to guide architectural decisions and error budget policies.
Conduct blameless post-incident reviews to identify root causes and implement preventive measures.
Build internal tools and automation to eliminate manual operational work.
Develop and maintain automated runbooks and playbooks for operational tasks and incident response.
Improve observability by turning high-cardinality data into proactive, actionable insights.
Proactively identify and mitigate operational risks through chaos engineering and architecture reviews.
Work with software engineers early in the SDLC to design for reliability, scalability, and maintainability.
Continuously optimize system performance, capacity, and cost efficiency.
Refine the on-call experience to reduce alert fatigue, improve MTTR, and keep rotations sustainable.

Requirements

Bachelor’s degree in Computer Engineering or a similar discipline.
5+ years of experience as a Site Reliability Engineer or in a similar role.
3+ years of experience with AWS services, including strong knowledge of container orchestration.
2+ years of Kubernetes experience.
Deep understanding of observability principles and tools such as Prometheus, Datadog, and OpenTelemetry.
Experience leading incident management and complex postmortem analysis.
Experience with infrastructure as code, especially Terraform.
Experience with chaos engineering and other resilience-testing techniques.
Experience with CI/CD tools such as GitHub Actions for automated delivery.
Proficiency in at least one programming language such as Python, Go, or Java for automation and internal tooling.
Event-driven architecture experience with SNS, SQS, or similar technologies.
Ability to work independently and collaboratively in a fast-paced environment.
Strong communication skills and fluency in English.
Prior experience with Scrum or other agile methods (preferred).
Certification such as AWS Certified DevOps Engineer or Certified Kubernetes Administrator (CKA) (preferred).
Prior experience with Telco Core Networks, low-latency networking, telecommunications, eSIM, or GSMA-related technologies (preferred).
Experience with AI-driven SRE tools for anomaly detection and improvements (preferred).
Contributions to open-source SRE projects or communities (preferred).

Benefits

Fully remote work.
Generous PTO.
Wellness allowance.
Learning allowance.
Annual Airalo Away retreat.
Standby fees and overtime pay for on-call rotations.
Delayed on-call start for the first 6 months.
Guaranteed rest periods and flexible hours after night incidents.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Assoc, Protocol Engineer (Chainlink)

Galaxy 251-1K Capital Markets

Galaxy is hiring an experienced Protocol, DevOps, or SRE Engineer to help build and operate secure blockchain infrastructure supporting its digital assets platform and custody offerings.

United States Full-time Mid Level Blockchain Developer Site Reliability Engineer (SRE)

AWS Azure Bash Blockchain C C++ Datadog Docker ELK Stack Encryption Ethereum GCP Go Grafana Java Kubernetes Linux Network Security Perl Prometheus Python Rust Solana Terraform

2 hours, 2 minutes ago

Apply

2 hours, 2 minutes ago

Senior Site Reliability Engineer

Parallel Domain 51-250 Aerospace & Defense

Parallel Domain is hiring a Senior Site Reliability Engineer to operate and evolve the infrastructure that powers large-scale simulation and validation for autonomous systems in a remote role across Canada and the U.S. Pacific Northwest.

United States Canada Full-time Senior Site Reliability Engineer (SRE)

$111k-$141k

Active Directory Argo CD AWS Bash DNS Docker GitHub Actions Grafana Helm Kubernetes Linux Load Balancing Packer Prometheus Python Terraform

5 hours, 51 minutes ago

Apply

5 hours, 51 minutes ago

Site Reliability Engineer (Senior or Staff), Atlas

MongoDB 1K-5K Internet Software & Services

MongoDB is hiring a Senior Site Reliability Engineer for its Atlas team to help support, maintain, and grow a multi-cloud platform for customer-facing production workloads.

United States Full-time Senior Site Reliability Engineer (SRE)

$127k-$249k

AWS Azure DNS GCP Go HTTP Linux Python Ruby TLS

7 hours, 58 minutes ago

Apply

7 hours, 58 minutes ago

Intermediate Site Reliability Engineer - OP02119

Dev.Pro 251-1K Internet Software & Services

Dev.Pro is hiring an IT Specialist for its SRE team to support company and client environments by maintaining infrastructure, monitoring services, and automating operations across cloud and on-premises systems.

Bulgaria Poland Portugal Full-time Mid Level Site Reliability Engineer (SRE)

Ansible Apache AWS Bash CI/CD DHCP DNS Docker ELK Stack GCP Git Grafana Jenkins Linux MySQL Nginx PostgreSQL Prometheus Puppet Python SQL SQL Server SSH TCP/IP TeamCity Terraform TLS Ubuntu Windows Server Zabbix

10 hours ago

Apply

10 hours ago

Airalo

Tags

Links

Senior Site Reliability Engineer

Airalo

Description

Requirements

Benefits

Similar Roles

Assoc, Protocol Engineer (Chainlink)

Senior Site Reliability Engineer

Site Reliability Engineer (Senior or Staff), Atlas

Intermediate Site Reliability Engineer - OP02119

You're on a roll! Sign up now to keep applying.