Engineer - HPC Platform

14 hours, 30 minutes ago
Full-time
Senior
DevOps and Infrastructure
Xenon7

Xenon7

Xenon7 provides advanced AI solutions and consultancy services, leveraging a team of highly qualified experts and a strong emphasis on research and innovation to address complex industry challenges and enhance operational efficiency.

Internet Software & Services
Founded 2014

Description

  • Design, build, and maintain scalable HPC platforms and cluster architectures.
  • Lead engineering and operations for HPC infrastructure, ensuring availability and performance for scientific workloads.
  • Collaborate with researchers and scientists to optimize performance and streamline computational workflows.
  • Automate orchestration, resource scheduling, data access, and reproducibility using tooling and automation.
  • Evolve and operate both public cloud and on-premises environments for HPC use cases.
  • Define, monitor, and report infrastructure metrics and resource utilization to drive platform improvements.
  • Advance initiatives that enable critical business projects and identify opportunities to accelerate the HPC roadmap.
  • Apply agile ways of working to deploy and operate HPC solutions at scale.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or a related technical field.
  • 5+ years of experience as an HPC Platform Engineer.
  • Demonstrated experience leading a global large-scale infrastructure project.
  • Hands-on experience with HPC platforms, including accelerators (e.g., GPUs) and HPC schedulers (e.g., Altair Grid Engine, Slurm).
  • Experience with Kubernetes platforms and container technologies (Docker, Apptainer).
  • Demonstrated experience with HPC workloads, infrastructure, and cluster architectures.
  • Expertise with the Linux command line, Linux troubleshooting, and HPC administration.
  • Experience with DevOps and infrastructure-as-code tools such as GitHub, Chef, Ansible, and Terraform.
  • Experience automating infrastructure and applications and strong programming/scripting skills in Python or Bash.
  • Continuous learning mindset and willingness to stay current with new HPC technologies and infrastructure trends.

Benefits

  • Attractive, market-leading salary package.
  • Clear career advancement path.
  • Professional development opportunities and support for learning new HPC technologies.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

HE - Azure Platform Engineer - 233

Thaloz 51-250 Internet Software & Services

Senior Azure Platform Engineer to lead design, deployment, and operation of production workloads on Azure Kubernetes Service (AKS), enabling a secure, scalable Platform-as-a-Service and accelerating time-to-market through repeatable AKS bootstrapping, CI/CD enablement, and platform automation.

Agile Azure CI/CD Docker Envoy Flux Git GitHub GitOps Helm Kanban Kubernetes Linux Microservices MongoDB PostgreSQL Prometheus REST API Scrum Shell Scripting SonarQube Terraform TLS YAML
15 hours, 45 minutes ago

Cloud / Platform Engineer

ARETUM Construction & Engineering

Aretum is hiring a Cloud/Platform Engineer to deploy, configure, and operate cloud-based platform solutions for federal clients in the VAEC environment, ensuring reliable, secure, and compliant service delivery.

AWS Azure CloudFormation Docker EC2 Git GitHub Actions GitLab CI Kubernetes Load Balancing OpenShift Secrets Management Terraform
16 hours, 30 minutes ago

Core & ML Ops Team Lead - Remote

Zyte 251-1K Professional Services

Team Lead for the Core & MLOps Squad at Zyte responsible for architecting, building, and operating the foundational platform that enables Zyte teams to build, run, and scale services and ML workflows reliably and securely.

Apache Airflow C++ CI/CD Docker Go Java Kafka Kubernetes Linux Mesos Microservices Python REST API Rust TCP/IP
16 hours, 45 minutes ago

Splunk Ingest Engineer| Remote| 1+ year contract

TWO95 International 51-250 Internet Software & Services

Splunk Ingest Engineer at Two95 on a 100% remote contract to build and maintain reliable Splunk data ingestion pipelines and platform operations supporting secure NS2/federal environments.

Ansible AWS Azure Bash C CI/CD GCP GitHub Actions Go JavaScript Jenkins Linux PowerShell Python Splunk SQL Terraform
1 day, 15 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers