Engineer - HPC Platform

3 weeks, 1 day ago
Full-time
Senior
DevOps and Infrastructure
Xenon7

Xenon7

Xenon7 provides advanced AI solutions and consultancy services, leveraging a team of highly qualified experts and a strong emphasis on research and innovation to address complex industry challenges and enhance operational efficiency.

Internet Software & Services
Founded 2014

Description

  • Design, build, and maintain scalable HPC platforms and cluster architectures.
  • Lead engineering and operations for HPC infrastructure, ensuring availability and performance for scientific workloads.
  • Collaborate with researchers and scientists to optimize performance and streamline computational workflows.
  • Automate orchestration, resource scheduling, data access, and reproducibility using tooling and automation.
  • Evolve and operate both public cloud and on-premises environments for HPC use cases.
  • Define, monitor, and report infrastructure metrics and resource utilization to drive platform improvements.
  • Advance initiatives that enable critical business projects and identify opportunities to accelerate the HPC roadmap.
  • Apply agile ways of working to deploy and operate HPC solutions at scale.

Requirements

  • Bachelor’s degree in Computer Science, Information Technology, or a related technical field.
  • 5+ years of experience as an HPC Platform Engineer.
  • Demonstrated experience leading a global large-scale infrastructure project.
  • Hands-on experience with HPC platforms, including accelerators (e.g., GPUs) and HPC schedulers (e.g., Altair Grid Engine, Slurm).
  • Experience with Kubernetes platforms and container technologies (Docker, Apptainer).
  • Demonstrated experience with HPC workloads, infrastructure, and cluster architectures.
  • Expertise with the Linux command line, Linux troubleshooting, and HPC administration.
  • Experience with DevOps and infrastructure-as-code tools such as GitHub, Chef, Ansible, and Terraform.
  • Experience automating infrastructure and applications and strong programming/scripting skills in Python or Bash.
  • Continuous learning mindset and willingness to stay current with new HPC technologies and infrastructure trends.

Benefits

  • Attractive, market-leading salary package.
  • Clear career advancement path.
  • Professional development opportunities and support for learning new HPC technologies.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Platform Engineer AI

Agiloft 51-250 Capital Markets

Agiloft is hiring a Staff Platform Engineer AI to help build and maintain enterprise contract lifecycle management software on a modern, cloud-native platform.

Agile API Gateway AWS CloudFormation Docker DynamoDB Git GitHub Actions NumPy Pandas PostgreSQL Python REST API Scikit-learn SciPy Serverless
7 hours, 50 minutes ago

Ingeniero de Plataforma

2Brains is seeking a Platform Engineer to design and maintain secure, scalable infrastructure and developer enablement services for Latin American client teams in a remote full-time consulting environment.

Ansible Bash CI/CD GCP GitHub Actions GitLab CI Jenkins Kubernetes Python Terraform
13 hours, 5 minutes ago

DevOps Platform Engineer

JustMarkets 1-10 Capital Markets

A DevOps Engineer role focused on building, operating, and improving secure, scalable infrastructure across on-premise and cloud environments at a company using modern automation and observability tooling.

Ansible AWS Bash CI/CD ClickHouse Cloudflare Docker Grafana Kubernetes Linux Power BI Prometheus Puppet Python SQL Tableau Terraform Windows Server
20 hours, 48 minutes ago

Senior DevOps / Platform Engineer

Kaseware 11-50 Professional Services

Kaseware is hiring a Senior DevOps / Platform Engineer to design and operate the cloud infrastructure that supports its customer deployments across Azure and AWS from a fully remote role within the EU.

ArangoDB Argo CD AWS Azure Bash CI/CD Datadog DNS Docker Elasticsearch Fluentd Git GitHub Actions GitLab CI GitOps Grafana Helm Kubectl Kubernetes Linux Load Balancing MongoDB Nginx PostgreSQL Prometheus Python Terraform TLS
20 hours, 55 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers