The Investigo Group

The Investigo Group

Hiring Regions We’re excited that you’re interested in joining our team! At the moment, we’re only able to hire applicants who are based in the UK (including Ireland) and the Netherlands. We hope to expand to more locations in the future, so thank you ...

Professional Services
Founded 2023

Description

  • Operate, harden, and extend production OpenShift, OKD, and Kubernetes clusters across on-premises and hybrid environments.
  • Support the migration from VMware to KVM and help modernize the underlying compute and storage layer.
  • Own and improve CI/CD processes across the full lifecycle of platform and application components.
  • Develop and mature GitOps deployment practices using tools such as Argo CD or Flux.
  • Maintain core platform services including identity, ingress, observability, certificate management, service mesh, and container registry capabilities.
  • Build and operate observability across logs, metrics, traces, alerting, SLOs, and error budgets.
  • Improve platform hardening for secure and regulated environments, including network policy, SELinux, image provenance, secret management, and audit controls.
  • Automate repeatable operational tasks using infrastructure and scripting tools such as Ansible, Terraform, Helm, Kustomize, Go, or Python.
  • Lead incident response, support blameless post-mortems, and drive systemic fixes.
  • Partner with networking and security teams on platform integration, segmentation, load balancing, and accreditation evidence.
  • Create and maintain documentation, runbooks, design notes, and operational guidance.
  • Mentor other engineers and act as a senior technical authority across cloud and Kubernetes operations.

Requirements

  • Strong experience running production Kubernetes environments.
  • Strong Linux fundamentals, including systemd, networking, storage, and performance troubleshooting.
  • Experience with at least one Kubernetes distribution such as OKD, OpenShift, vanilla Kubernetes, Rancher, EKS, AKS, or GKE.
  • Solid infrastructure as code experience, including Ansible plus Terraform or equivalent, alongside tools such as Helm and Kustomize.
  • GitOps and CI/CD experience managing full application and component lifecycles, using tools such as Argo CD, Flux, GitHub Actions, or similar.
  • Experience with observability tooling such as Prometheus, Grafana, Elastic Stack/LGTM, or OpenTelemetry.
  • Experience working with identity and access technologies such as OIDC, SAML, SCIM, or Keycloak.
  • Experience with virtualization or infrastructure platforms such as KVM, libvirt, or VMware.
  • Scripting or tooling experience using Go, Python, shell scripting, or similar.
  • Experience working in secure, regulated, or enterprise-scale environments.
  • Strong troubleshooting, problem-solving, and analytical skills.
  • Strong communication skills with the ability to produce clear documentation, runbooks, post-mortems, and technical guidance.
  • Eligible to hold UK SC clearance, including the right to work in the UK and meeting the stated residency requirements.
  • Specific OpenShift or OKD experience, including operators, MachineConfig, or SCCs, is desirable.
  • Service mesh experience such as Istio or Linkerd is desirable.
  • Policy engine experience such as OPA, Gatekeeper, or Kyverno is desirable.
  • Cloud-native application deployment experience using Helm, Terraform, Kustomize, or similar is desirable.
  • Storage experience such as Ceph, Longhorn, or OpenShift Data Foundation is desirable.
  • Networking experience including BGP, VXLAN, Palo Alto, or Juniper technologies is desirable.
  • Software supply chain security experience, including SBOMs, image signing, admission control, or tools such as Sigstore, is desirable.
  • Experience operating AI, ML, or GPU-enabled platforms is desirable.
  • CKA, CKAD, CKS, Red Hat certifications, or equivalent are desirable.
  • Active or recent UK SC clearance is desirable.
  • Recognised open-source contributions to the Kubernetes ecosystem are desirable.
  • Calm, structured, and methodical under pressure.
  • Collaborative working style across platform, development, QA, security, networking, and architecture teams.
  • Strong sense of ownership and accountability.
  • Automation-first mindset with a focus on removing repeatable manual work.
  • Able to influence technical practice through evidence, example, and credibility.
  • Pragmatic and solutions-focused approach to problem solving.
  • Curious about why systems fail, not just how to bring them back online.
  • Comfortable mentoring others and raising the technical capability of those around you.
  • Able to balance reliability, delivery pace, security, and compliance in a regulated environment.

Benefits

  • Private medical health cash plan.
  • 4x life assurance.
  • Generous holiday allowance.
  • Access to continuous learning and development opportunities.
  • Bonus potential based on performance and business-related factors.
  • Discounts on a wide range of products and services.
  • Pension scheme contributions.
  • EV car scheme.
  • Regular pay reviews.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior DevOps Engineer - Cloud Operations

Black Duck Inn 1K-5K Internet Software & Services

Black Duck Software is hiring a Sr. DevOps Engineer, Cloud Operations to own and operate global customer-facing SaaS and hosted infrastructure on Google Cloud Platform for enterprise applications.

Argo CD Bash CI/CD DevSecOps DNS GCP GitHub Actions GitOps Go HashiCorp Vault Helm Java Kubernetes Load Balancing Microservices Python Terraform TLS
7 hours, 39 minutes ago

Site Reliability Engineer (Hosted Infra) - Platform

Elastic 1K-5K Internet Software & Services

Elastic is hiring a Cloud Infrastructure SRE to help build and operate large-scale multi-cloud infrastructure that powers Elastic Cloud across globally distributed regions.

Ansible Argo CD Docker Go Kubernetes Linux Prometheus Puppet Terraform Ubuntu
9 hours, 51 minutes ago

Senior AIOps Engineer, Incident Response [Remote-US]

Quanata 201-500 information technology & services

Quanata is hiring an experienced production operations and reliability leader to oversee production health, incident response, and operational support for its AI-driven insurance technology platform.

AWS Confluence JIRA
17 hours, 15 minutes ago

SRE Lead

GoReel 51-200 Software Development

SRE Lead at a top European iGaming solution provider, responsible for building and maintaining the observability cloud infrastructure and platform while improving deployment processes and system reliability.

Argo CD AWS Azure Bash CI/CD Confluence Debian Docker EC2 Elasticsearch Fluentd GCP Git GitLab Grafana Helm Jenkins JIRA Kibana Kubernetes OpsGenie Prometheus Python
19 hours, 36 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers