Arbor

Arbor is the leading cloud MIS provider in the UK, empowering schools and MATs to collaborate effectively, save time, and enhance pupil achievement through centralized data management and insightful analytics.

IT Services

Information Technology

51-250 (180)

12 open positions

Links

View All Jobs

Site Reliability Technical Lead

3 weeks, 1 day ago

United Kingdom

Full-time

Lead

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Agile AWS Datadog Docker Go Kanban Kubernetes Microservices Prometheus Python Scrum Terraform

Apply Now

Arbor

IT Services

51-250

View All Jobs 12

Description

Define and guide system architecture, balancing trade-offs between speed, scalability, maintainability, and security.
Ensure systems are observable and meet agreed Service Level Objectives (SLOs) from design through production.
Drive continuous improvement in platform reliability, performance, and efficiency.
Lead root cause analysis (RCA) and help improve the incident response process and framework.
Drive automation initiatives across the team to reduce operational toil and improve efficiency.
Uphold coding standards, promote automated testing, and support production readiness standards for all services.
Lead technical estimation, feasibility assessments, and structured release planning.
Mentor and coach engineers through feedback, knowledge sharing, and technical guidance.
Collaborate with Product Managers, Engineering Managers, and engineers to align technical direction with product strategy.
Communicate complex technical concepts clearly to technical and non-technical stakeholders.

Requirements

Extensive professional experience in SRE, DevOps, or Platform Engineering on complex, scalable systems.
Extensive expertise with AWS and distributed cloud architectures.
Proven experience operating platforms serving a high volume of requests, around 1,000 requests per second.
Advanced proficiency with Terraform and configuration management tools.
Strong programming skills in Python, Go, or a similar language for automation and tooling.
Deep experience with monitoring and observability platforms such as DataDog or Prometheus, plus incident/problem management.
Expert understanding of distributed systems, microservices, and resilience patterns.
Hands-on experience with containerization and orchestration technologies including Docker, Kubernetes, or ECS.
Practical experience building and maintaining CI/CD pipelines for automated deployments.
Demonstrated ability to mentor and support the growth of fellow engineers.
Experience with chaos engineering and reliability testing is preferred.
Knowledge of security best practices and compliance frameworks is preferred.
Background in agile and lean methodologies such as Scrum or Kanban is preferred.
Contributions to open-source projects or the SRE community are preferred.

Benefits

Remote working arrangement.
Salary of £80,000 to £90,000.
32 days holiday, including Bank Holidays, made up of 25 days annual leave plus 7 company-wide days over Easter, Summer, and Christmas.
Life assurance at 3x annual salary.
Comprehensive wellness support through AIG Smart Health, including 24/7 virtual GP access, mental health support, counselling, and personalised health checks.
Private dental insurance with Bupa.
Salary sacrifice pension provided by Scottish Widows.
Enhanced maternity and adoption leave with 20 weeks full pay, plus 6 weeks full pay paternity leave.
Access to Calm and Bippit for wellbeing and financial coaching.
Flexible working arrangements.
Dedicated professional development budget for CPD courses, upskilling resources, and memberships.
One paid volunteer day per year.
Dog-friendly offices.
Referral voucher valued up to £200 for successful friend referrals.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Reliability Engineer

Barbaricum 251-1K Professional Services

Barbaricum is hiring a Senior Site Reliability Engineer to support MC&FP’s MODES contract by improving the reliability, scalability, resilience, and operational performance of IT and cloud systems in a federal mission environment.

United States Full-time Lead Site Reliability Engineer (SRE)

Ansible AWS Azure Chef Cybersecurity DevSecOps GCP PowerShell Puppet Python

11 hours, 30 minutes ago

Apply

11 hours, 30 minutes ago

Principal Site Reliability Engineer

Accela 251-1K Internet Software & Services

Accela is hiring a Principal Site Reliability Engineer to lead reliability, scalability, and operational excellence for its Civic Platform and cloud-based SaaS offerings in a highly regulated environment.

United States Full-time Lead Site Reliability Engineer (SRE)

$160k-$185k

Ansible Argo CD Azure Bash Flux Git GitHub GitOps HIPAA Kubernetes Linux OpenTelemetry PowerShell Python Terraform

11 hours, 30 minutes ago

Apply

11 hours, 30 minutes ago

Sr. Site Reliability Engineer (Starshield)

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Senior Site Reliability Engineer for Starshield to build and operate reliable infrastructure and automation supporting secure government satellite systems.

United States Full-time Senior Site Reliability Engineer (SRE)

$165k-$230k

Ansible Bash CI/CD Kubernetes Linux Python TCP/IP Terraform

1 day, 11 hours ago

Apply

1 day, 11 hours ago

Sr. Site Reliability Engineer (Starshield)

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Senior Site Reliability Engineer for Starshield to build and operate reliable infrastructure supporting government-focused satellite systems and national security missions.

United States Full-time Senior Site Reliability Engineer (SRE)

$165k-$230k

Ansible Bash CI/CD Kubernetes Linux Python TCP/IP Terraform

1 day, 11 hours ago

Apply

1 day, 11 hours ago

Arbor

Tags

Links

Site Reliability Technical Lead

Arbor

Description

Requirements

Benefits

Similar Roles

Senior Reliability Engineer

Principal Site Reliability Engineer

Sr. Site Reliability Engineer (Starshield)

Sr. Site Reliability Engineer (Starshield)

You're on a roll! Sign up now to keep applying.