Megaport

Megaport simplifies network connectivity with scalable bandwidth for cloud connections, metro ethernet, and Data Centre backhaul. Offering extensive coverage in APAC and expanding globally, Megaport empowers users to manage their networks through its u...

Diversified Telecommunication Services

Telecommunication Services

251-1K (420)

Founded 2013

$26M raised

18 open positions

Links

View All Jobs

Senior Site Reliability Engineer

1 month, 2 weeks ago

Australia

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

AWS Bash Cassandra CI/CD ClickHouse GitHub Go Kubernetes Linux PostgreSQL Python Terraform

Apply Now

Megaport

Diversified Telecommunication Services

251-1K

Founded 2013

$26M raised

View All Jobs 18

Description

Improve production reliability and system resilience within an SRE-scoped team.
Champion DevOps and SRE best practices and high standards of work.
Communicate with teams and stakeholders during requirements analysis, demonstrations, and delivery.
Investigate and resolve complex technical problems across multiple technologies.
Participate in on-call rotation, incident response, and blameless post-incident reviews.
Write code, handle alerts, improve solutions, and support other team members.
Work across globally distributed time zones in a self-directed, collaborative environment.
Contribute fresh ideas and help drive customer success and company goals.

Requirements

5+ years administering Linux systems and related infrastructure in production environments.
Strong understanding of SRE concepts such as SLIs, SLOs, SLAs, error budgets, blast radius, and blameless postmortems.
Focus on automation, toil reduction, and preventing recurring issues.
Experience writing effective runbooks for a broader team.
Strong Kubernetes and ecosystem fundamentals.
Cloud infrastructure experience, with AWS strongly preferred; bare-metal experience is a bonus.
Strong tool development skills in Bash, plus Python or Go preferred.
Infrastructure-as-code experience, with Terraform preferred.
CI/CD and version control experience, with GitHub preferred.
Database experience with Postgres, Cassandra, or ClickHouse preferred.
Experience operating production observability stacks across metrics, logs, and traces.
Comfort working on live production infrastructure with strong troubleshooting and incident ownership.
A history of continual professional development.
Self-directed and comfortable working asynchronously with a globally distributed team.
Experience picking up adjacent work when needed.

Benefits

Flexible working environments.
Birthday leave.
Generous study and training allowance plus 5 days of paid study leave.
Creative, fun, and contemporary workspaces.
A motivated team of industry experts and new talent.
Recognition through ‘Legend’ and ‘Kudos’ awards.
Health and wellness program.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

India Lead Application Security Engineer Site Reliability Engineer (SRE)

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM

16 hours, 38 minutes ago

Apply

16 hours, 38 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

Germany Spain Sweden United Kingdom Full-time Lead Site Reliability Engineer (SRE) Software Engineer

$103k-$123k

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform

1 day, 15 hours ago

Apply

1 day, 15 hours ago