Arbor

Arbor

Arbor is the leading cloud MIS provider in the UK, empowering schools and MATs to collaborate effectively, save time, and enhance pupil achievement through centralized data management and insightful analytics.

IT Services
51-250

Description

  • Define and guide system architecture, balancing speed, scalability, maintainability, and security.
  • Champion reliability and performance by ensuring systems are observable and meet agreed SLOs.
  • Lead root cause analysis and help improve incident response processes and frameworks.
  • Drive automation initiatives to reduce operational toil and improve system efficiency.
  • Uphold coding standards, promote automated testing, and ensure production readiness standards.
  • Lead technical estimation, feasibility assessments, release planning, and post-release reviews.
  • Mentor and coach engineers through feedback, knowledge sharing, and technical guidance.
  • Collaborate with Product Managers, Engineering Managers, and engineers to align technical direction with product strategy.
  • Communicate complex technical concepts clearly to technical and non-technical stakeholders.

Requirements

  • Extensive professional experience in SRE, DevOps, or Platform Engineering on complex, scalable systems.
  • Extensive expertise with AWS and distributed cloud architectures.
  • Proven experience operating platforms serving a high volume of requests, around 1000 requests per second.
  • Advanced proficiency with Terraform and configuration management tools.
  • Strong programming skills in Python, Go, or a similar language for automation and tooling.
  • Deep experience with monitoring and observability platforms such as DataDog or Prometheus, plus incident/problem management.
  • Expert understanding of distributed systems, microservices, and resilience patterns.
  • Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or ECS.
  • Practical experience building and maintaining CI/CD pipelines for automated deployments.
  • Demonstrated ability to mentor and support the growth of fellow engineers.
  • Experience with chaos engineering and reliability testing is a bonus.
  • Knowledge of security best practices and compliance frameworks is a bonus.
  • Background in agile and lean methodologies such as Scrum or Kanban is a bonus.
  • Contributions to open-source projects or the SRE community are a bonus.
  • Visa sponsorship is not available for this role.

Benefits

  • Salary of £80,000 - £90,000.
  • Remote working.
  • 32 days holiday including Bank Holidays, made up of 25 days annual leave plus 7 company-wide days.
  • Life assurance at 3x annual salary.
  • Private dental insurance with Bupa.
  • Salary sacrifice pension provided by Scottish Widows.
  • Enhanced maternity and adoption leave of 20 weeks full pay, and paternity leave of 6 weeks full pay.
  • Access to wellbeing support, including mindfulness, mental health first aid training, Calm, Bippit, and AIG Smart Health with 24/7 virtual GP, counselling, and health checks.
  • Flexible working arrangements discussed to suit individual needs.
  • Dedicated professional development budget for CPD courses, upskilling resources, and professional memberships.
  • Volunteer with a charity of your choice for one day each year.
  • Dog-friendly offices.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
13 hours, 41 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 12 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 13 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 13 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers