The Investigo Group

The Investigo Group

Hiring Regions We’re excited that you’re interested in joining our team! At the moment, we’re only able to hire applicants who are based in the UK (including Ireland) and the Netherlands. We hope to expand to more locations in the future, so thank you ...

Professional Services
Founded 2023

Description

  • Operate, harden, and extend production OpenShift, OKD, and Kubernetes clusters across on-premises and hybrid environments.
  • Support the migration from VMware to KVM and help modernize the underlying compute and storage layer.
  • Own and improve CI/CD processes across the full lifecycle of platform and application components.
  • Develop and mature GitOps deployment practices using tools such as Argo CD or Flux.
  • Maintain core platform services including identity, ingress, observability, certificate management, service mesh, and container registry capabilities.
  • Build and operate observability across logs, metrics, traces, alerting, SLOs, and error budgets.
  • Improve platform hardening for secure and regulated environments, including network policy, SELinux, image provenance, secret management, and audit controls.
  • Automate repeatable operational tasks using infrastructure and scripting tools such as Ansible, Terraform, Helm, Kustomize, Go, or Python.
  • Lead incident response, support blameless post-mortems, and drive systemic fixes.
  • Partner with networking and security teams on platform integration, segmentation, load balancing, and accreditation evidence.
  • Create and maintain documentation, runbooks, design notes, and operational guidance.
  • Mentor other engineers and act as a senior technical authority across cloud and Kubernetes operations.

Requirements

  • Strong experience running production Kubernetes environments.
  • Strong Linux fundamentals, including systemd, networking, storage, and performance troubleshooting.
  • Experience with at least one Kubernetes distribution such as OKD, OpenShift, vanilla Kubernetes, Rancher, EKS, AKS, or GKE.
  • Solid infrastructure as code experience, including Ansible plus Terraform or equivalent, alongside tools such as Helm and Kustomize.
  • GitOps and CI/CD experience managing full application and component lifecycles, using tools such as Argo CD, Flux, GitHub Actions, or similar.
  • Experience with observability tooling such as Prometheus, Grafana, Elastic Stack/LGTM, or OpenTelemetry.
  • Experience working with identity and access technologies such as OIDC, SAML, SCIM, or Keycloak.
  • Experience with virtualization or infrastructure platforms such as KVM, libvirt, or VMware.
  • Scripting or tooling experience using Go, Python, shell scripting, or similar.
  • Experience working in secure, regulated, or enterprise-scale environments.
  • Strong troubleshooting, problem-solving, and analytical skills.
  • Strong communication skills with the ability to produce clear documentation, runbooks, post-mortems, and technical guidance.
  • Eligible to hold UK SC clearance, including the right to work in the UK and meeting the stated residency requirements.
  • Specific OpenShift or OKD experience, including operators, MachineConfig, or SCCs, is desirable.
  • Service mesh experience such as Istio or Linkerd is desirable.
  • Policy engine experience such as OPA, Gatekeeper, or Kyverno is desirable.
  • Cloud-native application deployment experience using Helm, Terraform, Kustomize, or similar is desirable.
  • Storage experience such as Ceph, Longhorn, or OpenShift Data Foundation is desirable.
  • Networking experience including BGP, VXLAN, Palo Alto, or Juniper technologies is desirable.
  • Software supply chain security experience, including SBOMs, image signing, admission control, or tools such as Sigstore, is desirable.
  • Experience operating AI, ML, or GPU-enabled platforms is desirable.
  • CKA, CKAD, CKS, Red Hat certifications, or equivalent are desirable.
  • Active or recent UK SC clearance is desirable.
  • Recognised open-source contributions to the Kubernetes ecosystem are desirable.
  • Calm, structured, and methodical under pressure.
  • Collaborative working style across platform, development, QA, security, networking, and architecture teams.
  • Strong sense of ownership and accountability.
  • Automation-first mindset with a focus on removing repeatable manual work.
  • Able to influence technical practice through evidence, example, and credibility.
  • Pragmatic and solutions-focused approach to problem solving.
  • Curious about why systems fail, not just how to bring them back online.
  • Comfortable mentoring others and raising the technical capability of those around you.
  • Able to balance reliability, delivery pace, security, and compliance in a regulated environment.

Benefits

  • Private medical health cash plan.
  • 4x life assurance.
  • Generous holiday allowance.
  • Access to continuous learning and development opportunities.
  • Bonus potential based on performance and business-related factors.
  • Discounts on a wide range of products and services.
  • Pension scheme contributions.
  • EV car scheme.
  • Regular pay reviews.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Manager, Engineering

Sumo Logic 251-1K Internet Software & Services

Sumo Logic is hiring a Senior Manager, Engineering for Application Security to lead global programs that improve product security, reliability, and operational efficiency across its cloud platform.

Agile AWS C++ Docker GCP Java Kafka Kubernetes OWASP Ruby Scala SIEM
14 hours, 49 minutes ago

Staff Software Engineer - Databases SRE | Sweden | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer, SRE to improve the reliability and scalability of Grafana Cloud’s database products for high-value customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Microservices Python Terraform
1 day, 14 hours ago

Senior Site Reliability Engineer (SRE)

Oowlish 51-250 Internet Software & Services

Oowlish is hiring a Senior Site Reliability Engineer to own the reliability and operational excellence of business-critical production systems for international clients in a remote, collaborative environment.

AWS Datadog Go Heroku Kubernetes PostgreSQL Python SQL Server TypeScript
1 day, 14 hours ago

Staff Software Engineer - Databases SRE | Spain | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Staff Software Engineer - SRE to strengthen the reliability of its cloud database products for high-SLA customers across AWS, GCP, and Azure.

AWS Azure GCP Go Helm Java Kubernetes Linux Python Terraform
1 day, 14 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers