Lead MLOps Engineer

5 hours, 10 minutes ago
Full-time
Lead
Software Development
NexGen Cloud

NexGen Cloud

NexGen Cloud is Europe's leading sustainable cloud Infrastructure as a Service (IaaS) provider, specializing in high-performance computing (HPC) and GPU infrastructure. With a focus on sustainability and innovation, NexGen Cloud offers GPU as a Service...

IT Services
11-50
Founded 2020

Description

  • Own the design, implementation, and evolution of core MLOps systems across Hyperstack, including the infrastructure and workflows behind AI Studio.
  • Build and improve orchestration systems for model training, fine-tuning, evaluation, and deployment for large GPU workloads.
  • Own production readiness for ML infrastructure, including monitoring, alerting, incident response, and continuous improvement.
  • Define and embed MLOps practices such as model versioning, reproducibility, deployment safety, rollback strategies, and environment management.
  • Provide technical leadership through architecture decisions, implementation guidance, and shared engineering standards.
  • Work closely with Product, Engineering, and cross-functional teams to shape the platform and its operating model.
  • Support reliable and repeatable production ML operations for complex, asynchronous, resource-intensive workloads.

Requirements

  • Proven experience designing, building, and operating production ML infrastructure, platform systems, or MLOps workflows in cloud environments.
  • Hands-on Python development experience, including backend systems, automation, and developer or platform tooling.
  • Experience supporting LLM, generative AI, or fine-tuning workflows in production, including training, evaluation, deployment, inference, and lifecycle management.
  • Production-grade experience with Docker, Kubernetes, CI/CD, and infrastructure-as-code in operational environments.
  • Experience owning complex, asynchronous, or resource-intensive workloads end to end, including orchestration, reliability, observability, and incident response.
  • Ability to work cross-functionally and provide technical leadership through influence across engineering teams.
  • Exposure to GPU-intensive, distributed, or performance-sensitive ML workloads is preferred.
  • Experience building internal developer platforms or tooling that improve experimentation, reproducibility, and delivery speed for ML teams is preferred.
  • Background in cloud infrastructure, platform products, or technically complex B2B software is preferred.

Benefits

  • Competitive salary and annual discretionary bonus scheme.
  • Employee wellbeing benefits.
  • 25 days of holiday plus public holidays.
  • Flexible working arrangements, including remote or hybrid options depending on role and location.
  • Real ownership and autonomy with the trust to take initiative and experiment.
  • Opportunity to make a visible, meaningful impact as the company scales.
  • Clear career progression and growth opportunities in a fast-growing company.
  • A collaborative, international culture built on trust, transparency, and ownership.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior ML-Engineer, Finance

Fundraise Up 51-250 Capital Markets

Fundraise Up is hiring a Senior ML Engineer, Finance to build and productionize an end-to-end client intelligence system that identifies and scores potential nonprofit prospects globally for use in the sales pipeline.

Apache Airflow CatBoost CI/CD ClickHouse Docker FastAPI Git Grafana Linux LLM MLflow MongoDB NLP Pandas Python Redis Salesforce SQL
40 minutes ago

Senior Applied AI Engineer (Europe)

Kalepa 11-50 Insurance

Kalepa is hiring a Senior Applied AI Engineer to lead machine learning model framing, development, and deployment for insurance risk analysis using large structured and unstructured datasets.

AWS Machine Learning NLP Python Statistics
40 minutes ago

Senior Machine Learning Engineer - Prediction

Motional 1K-5K Automotive

Motional is hiring an ML engineer to develop, evaluate, and deploy models that understand complex driving scenes and generate safe trajectories for its robotaxi autonomous driving system.

C++ Computer Vision Deep Learning Machine Learning Neural Networks Python PyTorch
55 minutes ago

Software Engineer II (ML Feature Platform)

Affirm 1K-5K Diversified Financial Services

Affirm is hiring a software engineer for its ML Feature Platform team to build and operate the self-serve data and feature infrastructure that powers machine learning and online decisioning.

AWS Kotlin Kubernetes Machine Learning MySQL Python
1 hour, 10 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers