Senior Systems HPC Engineer

11 hours, 39 minutes ago
Full-time
Senior
Software Development
Nebius

Nebius

Nebius enables B2B companies to build local hyperscaling cloud platforms with cost-effective GPUs, InfiniBand network, and 50% less compute cost. They offer managed Kubernetes and a launch-ready business model for innovative cloud solutions.

Internet Software & Services
51-250

Description

  • Analyze and optimize the performance of large-scale GPU clusters across the full stack.
  • Investigate and troubleshoot performance issues in GPU clusters under real training and inference workloads.
  • Identify system bottlenecks across hardware, system software, networking, and distributed communication layers.
  • Evaluate and integrate new hardware, system configurations, and tuning approaches into the software stack.
  • Support complex performance-related escalations from internal teams and customers.
  • Collaborate closely with infrastructure, software engineering, and hardware vendor teams such as NVIDIA, Mellanox, and Intel.
  • Contribute to hardware and cluster qualification to ensure systems meet performance expectations.
  • Drive improvements that influence how clusters are built, operated, tuned, and validated.

Requirements

  • 5+ years of professional experience in system-level software development focused on performance optimization and low-level programming.
  • 3+ years of hands-on experience with Linux systems, including administration, troubleshooting, and performance tuning.
  • In-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/kernel, and HPC systems.
  • Strong proficiency in one or more performance-oriented programming languages such as C, C++, Go, or Python.
  • Experience working across networking technologies including InfiniBand and RoCE.
  • Experience with virtualization technologies such as KVM and QEMU.
  • Experience with distributed communication layers such as MPI and NCCL.
  • Coding interview participation is required as part of the hiring process.

Benefits

  • Competitive salary and comprehensive benefits package.
  • Opportunities for professional growth within Nebius.
  • Flexible working arrangements.
  • A dynamic and collaborative work environment that values initiative and innovation.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Systems Engineer, BizTech

Airbnb 5K-10K Hotels, Restaurants & Leisure

Airbnb is hiring a Staff Systems Engineer to lead Oracle EPM Planning for Finance, owning budgeting and forecasting systems, integrations, and process transformation within BizTech.

AWS Groovy SQL
4 hours, 22 minutes ago

Virtualization Solution Architect

ProArch 251-1K Internet Software & Services

Virtualization Solution Architect at a managed services company responsible for designing and delivering enterprise virtualization and migration solutions for customer environments.

4 hours, 56 minutes ago

Software Engineer - Simulation Integrations

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is seeking a Sim Integrations Engineer to own simulation delivery for Tactical Recon & Strike programs, translating program needs into working environments that support development, validation, and field deployment of autonomous systems.

Bash C++ CI/CD Docker Linux Python Unix
11 hours, 24 minutes ago

Staff Systems Engineer, M&A

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Systems Engineer to lead acquisition integration for infrastructure, owning the end-to-end process of bringing newly acquired companies into the company’s ecosystem.

Ansible AWS Azure CI/CD Docker GCP Kubernetes Linux Load Balancing Python Terraform
12 hours, 9 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers