The Investigo Group

Hiring Regions We’re excited that you’re interested in joining our team! At the moment, we’re only able to hire applicants who are based in the UK (including Ireland) and the Netherlands. We hope to expand to more locations in the future, so thank you ...

Professional Services

Industrials

Founded 2023

3 open positions

Links

View All Jobs

Senior Site Reliability Engineer (SRE)

1 month, 1 week ago

United Kingdom

Full-time

Senior

Site Reliability Engineer (SRE)

DevOps and Infrastructure

Ansible Argo CD CI/CD Flux GitHub Actions GitOps Go Grafana Helm Juniper Kubernetes Linux Load Balancing Machine Learning OpenID Connect OpenShift OpenTelemetry Palo Alto Prometheus Python SAML Shell Scripting Terraform

Apply Now

The Investigo Group

Professional Services

Founded 2023

View All Jobs 3

Description

Operate, harden, and extend production OpenShift, OKD, and Kubernetes clusters across on-premises and hybrid environments.
Support the migration from VMware to KVM and help modernize the underlying compute and storage layer.
Own and improve CI/CD processes across the full lifecycle of platform and application components.
Develop and mature GitOps deployment practices using tools such as Argo CD or Flux.
Maintain core platform services including identity, ingress, observability, certificate management, service mesh, and container registry capabilities.
Build and operate observability across logs, metrics, traces, alerting, SLOs, and error budgets.
Improve platform hardening for secure and regulated environments, including network policy, SELinux, image provenance, secret management, and audit controls.
Automate repeatable operational tasks using infrastructure and scripting tools such as Ansible, Terraform, Helm, Kustomize, Go, or Python.
Lead incident response, support blameless post-mortems, and drive systemic fixes.
Partner with networking and security teams on platform integration, segmentation, load balancing, and accreditation evidence.
Create and maintain documentation, runbooks, design notes, and operational guidance.
Mentor other engineers and act as a senior technical authority across cloud and Kubernetes operations.

Requirements

Strong experience running production Kubernetes environments.
Strong Linux fundamentals, including systemd, networking, storage, and performance troubleshooting.
Experience with at least one Kubernetes distribution such as OKD, OpenShift, vanilla Kubernetes, Rancher, EKS, AKS, or GKE.
Solid infrastructure as code experience, including Ansible plus Terraform or equivalent, alongside tools such as Helm and Kustomize.
GitOps and CI/CD experience managing full application and component lifecycles, using tools such as Argo CD, Flux, GitHub Actions, or similar.
Experience with observability tooling such as Prometheus, Grafana, Elastic Stack/LGTM, or OpenTelemetry.
Experience working with identity and access technologies such as OIDC, SAML, SCIM, or Keycloak.
Experience with virtualization or infrastructure platforms such as KVM, libvirt, or VMware.
Scripting or tooling experience using Go, Python, shell scripting, or similar.
Experience working in secure, regulated, or enterprise-scale environments.
Strong troubleshooting, problem-solving, and analytical skills.
Strong communication skills with the ability to produce clear documentation, runbooks, post-mortems, and technical guidance.
Eligible to hold UK SC clearance, including the right to work in the UK and meeting the stated residency requirements.
Specific OpenShift or OKD experience, including operators, MachineConfig, or SCCs, is desirable.
Service mesh experience such as Istio or Linkerd is desirable.
Policy engine experience such as OPA, Gatekeeper, or Kyverno is desirable.
Cloud-native application deployment experience using Helm, Terraform, Kustomize, or similar is desirable.
Storage experience such as Ceph, Longhorn, or OpenShift Data Foundation is desirable.
Networking experience including BGP, VXLAN, Palo Alto, or Juniper technologies is desirable.
Software supply chain security experience, including SBOMs, image signing, admission control, or tools such as Sigstore, is desirable.
Experience operating AI, ML, or GPU-enabled platforms is desirable.
CKA, CKAD, CKS, Red Hat certifications, or equivalent are desirable.
Active or recent UK SC clearance is desirable.
Recognised open-source contributions to the Kubernetes ecosystem are desirable.
Calm, structured, and methodical under pressure.
Collaborative working style across platform, development, QA, security, networking, and architecture teams.
Strong sense of ownership and accountability.
Automation-first mindset with a focus on removing repeatable manual work.
Able to influence technical practice through evidence, example, and credibility.
Pragmatic and solutions-focused approach to problem solving.
Curious about why systems fail, not just how to bring them back online.
Comfortable mentoring others and raising the technical capability of those around you.
Able to balance reliability, delivery pace, security, and compliance in a regulated environment.

Benefits

Private medical health cash plan.
4x life assurance.
Generous holiday allowance.
Access to continuous learning and development opportunities.
Bonus potential based on performance and business-related factors.
Discounts on a wide range of products and services.
Pension scheme contributions.
EV car scheme.
Regular pay reviews.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer

Counterpart Health 51-200 hospital & health care

Counterpart Health is hiring a Senior Site Reliability and Infrastructure Engineer to support and evolve the technology platform behind its primary care tool and maintain reliable infrastructure for domestic and international workloads.

United States Full-time Senior Site Reliability Engineer (SRE)

$160k-$208k

AWS Azure CI/CD Containerd DNS Docker GCP Go gRPC Helm Kubernetes Linux Load Balancing Prometheus Python Shell Scripting TCP/IP

16 hours, 1 minute ago

Apply

16 hours, 1 minute ago

Senior Test Platform & Reliability Engineer - Star Trek Fleet Command

Scopely 1K-5K Internet Software & Services

Scopely is hiring a Senior Test Platform & Reliability Engineer in Ireland to build validation, reliability, and developer enablement platforms for Star Trek Fleet Command’s large-scale live-service backend systems.

Ireland Full-time Senior SDET (Software Development Engineer in Test) Site Reliability Engineer (SRE)

AWS Bash CI/CD Docker GitLab Go Python Terraform

16 hours, 16 minutes ago

Apply

16 hours, 16 minutes ago

Senior Software Engineer - Databases, SRE | Canada | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Senior Software Engineer for its remote SRE team to improve reliability and operability of Grafana Cloud database services for high-SLA customers across AWS, GCP, and Azure.

Canada Full-time Senior Site Reliability Engineer (SRE) Software Engineer

$108k-$130k

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform

1 day, 15 hours ago

Apply

1 day, 15 hours ago

Senior Site Reliability Engineer

Semios 51-250 Food Products

Semios Group is hiring a Senior Site Reliability Engineer to help scale, secure, and improve the reliability of its global agricultural technology platform.

Canada Full-time Senior Site Reliability Engineer (SRE)

$140k-$160k

AWS Azure Bash Buildkite CI/CD Datadog Docker Envoy GCP Git GitHub GitHub Actions GitLab Go Jenkins Kubernetes Linux NATS New Relic Prometheus Python Ruby Splunk Terraform

1 day, 16 hours ago

Apply

1 day, 16 hours ago

The Investigo Group

Tags

Links

Senior Site Reliability Engineer (SRE)

The Investigo Group

Description

Requirements

Benefits

Similar Roles

Senior Site Reliability Engineer

Senior Test Platform & Reliability Engineer - Star Trek Fleet Command

Senior Software Engineer - Databases, SRE | Canada | Remote

Senior Site Reliability Engineer

You're on a roll! Sign up now to keep applying.