NexGen Cloud

NexGen Cloud

NexGen Cloud is Europe's leading sustainable cloud Infrastructure as a Service (IaaS) provider, specializing in high-performance computing (HPC) and GPU infrastructure. With a focus on sustainability and innovation, NexGen Cloud offers GPU as a Service...

IT Services
11-50
Founded 2020

Description

  • Own the design, implementation, and evolution of core MLOps systems across Hyperstack, including the infrastructure and workflows behind AI Studio.
  • Build and improve orchestration systems for model training, fine-tuning, evaluation, and deployment for large GPU workloads.
  • Own production readiness for ML infrastructure, including monitoring, alerting, incident response, and continuous improvement.
  • Define and embed MLOps practices such as model versioning, reproducibility, deployment safety, rollback strategies, and environment management.
  • Provide technical leadership through architecture decisions, implementation guidance, and shared engineering standards.
  • Work closely with Product, Engineering, and cross-functional teams to shape the platform and its operating model.
  • Support reliable and repeatable production ML operations for complex, asynchronous, resource-intensive workloads.

Requirements

  • Proven experience designing, building, and operating production ML infrastructure, platform systems, or MLOps workflows in cloud environments.
  • Hands-on Python development experience, including backend systems, automation, and developer or platform tooling.
  • Experience supporting LLM, generative AI, or fine-tuning workflows in production, including training, evaluation, deployment, inference, and lifecycle management.
  • Production-grade experience with Docker, Kubernetes, CI/CD, and infrastructure-as-code in operational environments.
  • Experience owning complex, asynchronous, or resource-intensive workloads end to end, including orchestration, reliability, observability, and incident response.
  • Ability to work cross-functionally and provide technical leadership through influence across engineering teams.
  • Exposure to GPU-intensive, distributed, or performance-sensitive ML workloads is preferred.
  • Experience building internal developer platforms or tooling that improve experimentation, reproducibility, and delivery speed for ML teams is preferred.
  • Background in cloud infrastructure, platform products, or technically complex B2B software is preferred.

Benefits

  • Competitive salary and annual discretionary bonus scheme.
  • Employee wellbeing benefits.
  • 25 days of holiday plus public holidays.
  • Flexible working arrangements, including remote or hybrid options depending on role and location.
  • Real ownership and autonomy with the trust to take initiative and experiment.
  • Opportunity to make a visible, meaningful impact as the company scales.
  • Clear career progression and growth opportunities in a fast-growing company.
  • A collaborative, international culture built on trust, transparency, and ownership.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Machine Learning Engineer, Conversion Modeling

Unity 5K-10K Internet Software & Services

Unity is hiring a Senior ML Engineer to build and improve large-scale ad ranking, recommendation, and bidding optimization systems that power Unity Ads.

C++ Go Machine Learning Python Reinforcement Learning Scala Statistics
1 hour, 8 minutes ago

Machine Learning Architect

Mindera 1K-5K Internet Software & Services

Mindera is seeking an experienced Machine Learning Architect to lead the design of scalable AI and ML solutions across cloud data platforms for enterprise production use.

Apache Airflow Apache Spark AWS Azure CI/CD Databricks Feature Engineering GCP LLM Machine Learning MLOps Python SQL
2 hours, 13 minutes ago

Senior Machine Learning Infrastructure Engineer

Unity 5K-10K Internet Software & Services

Unity is hiring a Senior Machine Learning Infrastructure Engineer to build and operate real-time ML serving infrastructure for its global advertising platform, helping production ranking, bidding, and targeting systems run at scale.

Go Grafana Kubernetes OpenTelemetry Prometheus Python Terraform
2 hours, 20 minutes ago

Senior Machine Learning Engineer, Zeitgeist, Personalization

Spotify Media

Spotify is hiring a Senior Machine Learning Engineer for the Personalization team in New York to build AI-powered systems that understand cultural trends and improve personalized listening experiences for millions of users.

GCP Generative AI Java LLM Machine Learning NLP Python Scala
2 hours, 49 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers