Engineer - HPC Platform

1 month, 1 week ago
Full-time
Senior
DevOps and Infrastructure
Xenon7

Xenon7

Xenon7 provides advanced AI solutions and consultancy services, leveraging a team of highly qualified experts and a strong emphasis on research and innovation to address complex industry challenges and enhance operational efficiency.

Internet Software & Services
Founded 2014

Description

  • Design, build, and maintain scalable HPC platforms and cluster architectures.
  • Lead engineering and operations for HPC infrastructure, ensuring availability and performance for scientific workloads.
  • Collaborate with researchers and scientists to optimize performance and streamline computational workflows.
  • Automate orchestration, resource scheduling, data access, and reproducibility using tooling and automation.
  • Evolve and operate both public cloud and on-premises environments for HPC use cases.
  • Define, monitor, and report infrastructure metrics and resource utilization to drive platform improvements.
  • Advance initiatives that enable critical business projects and identify opportunities to accelerate the HPC roadmap.
  • Apply agile ways of working to deploy and operate HPC solutions at scale.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or a related technical field.
  • 5+ years of experience as an HPC Platform Engineer.
  • Demonstrated experience leading a global large-scale infrastructure project.
  • Hands-on experience with HPC platforms, including accelerators (e.g., GPUs) and HPC schedulers (e.g., Altair Grid Engine, Slurm).
  • Experience with Kubernetes platforms and container technologies (Docker, Apptainer).
  • Demonstrated experience with HPC workloads, infrastructure, and cluster architectures.
  • Expertise with the Linux command line, Linux troubleshooting, and HPC administration.
  • Experience with DevOps and infrastructure-as-code tools such as GitHub, Chef, Ansible, and Terraform.
  • Experience automating infrastructure and applications and strong programming/scripting skills in Python or Bash.
  • Continuous learning mindset and willingness to stay current with new HPC technologies and infrastructure trends.

Benefits

  • Attractive, market-leading salary package.
  • Clear career advancement path.
  • Professional development opportunities and support for learning new HPC technologies.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Backend Engineer - Platform - Stacks | UK | Remote

Grafana 1K-5K IT Services

Grafana Labs is hiring a Backend Engineer for its Platform Stacks team to build and operate the systems that create, configure, reconcile, and manage Grafana Cloud stacks across regions and services.

AWS Azure Flux GCP Go Grafana Helm Kubernetes Microservices Node.js Terraform TypeScript
1 hour, 50 minutes ago

Senior AI Platform Engineer

Wellhub 1-10 Gas Utilities

Wellhub is hiring a Senior AI Platform Engineer in Brazil to help build and evolve the cloud-native ML development platform that enables engineers and data scientists to develop and deploy AI at scale.

Apache Spark AWS CI/CD Kubeflow Kubernetes MLOps Python Terraform
8 hours, 3 minutes ago

Platform Engineer III

Veeam Software 1K-5K Internet Software & Services

Veeam is hiring a Platform Engineer for the Veeam Data Cloud to build and operate a secure, reliable platform that helps teams develop, test, deploy, and monitor the VDC product.

AWS Azure Bash Docker Git GitHub Actions Go Helm Java Kubernetes Microservices Pulumi Python Serverless Terraform
13 hours, 12 minutes ago

AI Platform Engineer

NEORIS 5K-10K Internet Software & Services

NEORIS, part of the EPAM group, is seeking a Principal AI Platform Engineer to design and advance enterprise-scale AI platform capabilities that support governed ML and AI delivery across the organization.

Apache Spark AWS CI/CD Cybersecurity Kubernetes MLOps Python Terraform
14 hours, 48 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers