Relativity Space

Relativity Space is a cutting-edge rocket company using 3D printing and AI to provide cost-effective reusable rockets for commercial launches, with a vision to advance industrial capabilities on Earth and Mars.

Aerospace & Defense

Industrials

251-1K (700)

Founded 2015

$1333M raised

178 open positions

Links

View All Jobs

Staff GPU Systems Engineer, Space Computing

15 hours, 44 minutes ago

United States

Full-time

Senior

Machine Learning Engineer

DevOps and Infrastructure

C C++ Docker Linux Machine Learning Podman Python PyTorch TensorFlow

Apply Now

Relativity Space

Aerospace & Defense

251-1K

Founded 2015

$1333M raised

View All Jobs 178

Description

Own the GPU compute environment, including setup, driver integration, container runtime, job scheduling, and performance optimization.
Profile and optimize compute performance across the full stack, including GPU utilization, memory bandwidth, I/O throughput, and storage interface performance.
Build power- and thermal-aware compute scheduling that aligns batch workloads with orbital constraints.
Develop compute health monitoring and upset recovery mechanisms such as checkpoint/restart, GPU fault detection, and automated recovery.
Integrate GPU drivers with the payload Linux image in coordination with the Platform RE team.
Manage the container runtime for compute workloads.
Ensure the platform reliably runs ML frameworks and SAR processing pipelines maintained by the broader operations team.

Requirements

BS or MS in Computer Science or Electrical Engineering.
5+ years of relevant experience.
Hands-on experience with GPU programming and compute frameworks such as CUDA, ROCm, or OpenCL.
Real performance profiling and optimization experience with GPU workloads.
Strong Linux systems administration and performance tuning skills.
Experience with container technologies such as Docker, Podman, or lightweight alternatives.
Experience with HPC job scheduling concepts.
Working proficiency in Python for tooling, scripting, and ML framework integration.
C/C++ skills for performance-critical system components.
Experience with HPC cluster administration, ML infrastructure, or cloud GPU compute platforms at scale is preferred.
Deep familiarity with ML framework runtime requirements, including PyTorch or TensorFlow deployment, model serving, and inference optimization, is preferred.
Knowledge of GPU compute architectures at the hardware level is preferred.
Experience with high-throughput data movement and storage I/O optimization is preferred.
Background in power-managed computing, including duty cycling, thermal throttling, and workload scheduling under variable power constraints, is preferred.
Experience designing checkpoint/restart or fault-tolerant batch processing systems is preferred.

Benefits

Competitive salary with a hiring range of $181,000 to $248,500 USD.
Equity compensation.
Generous PTO and sick leave policy.
Parental leave.
Annual learning and development stipend.
Additional benefits and perks available through the company benefits program.
Reasonable accommodation support during the hiring process.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Systems Engineer (AV/VTC)

MetroStar 251-1K IT Services

MetroStar is hiring a Systems Engineer (AV/VTC) to support and maintain video teleconferencing systems through design, installation, administration, troubleshooting, and asset inventory management for regional field offices.

United States Full-time Junior Systems Engineer

$85k-$97k

System Design

25 minutes ago

Apply

25 minutes ago

AI Security - AI Platform Team Lead

Cato Networks 251-1K Diversified Telecommunication Services

Cato Networks is hiring an AI Platform Team Lead to build and lead the runtime infrastructure for large-scale AI security models across its global cloud and physical points of presence.

Israel Lead AI Engineer Machine Learning Engineer

C++ Docker Go Java Kubernetes MLOps PyTorch Rust System Design

54 minutes ago

Apply

54 minutes ago

Senior Linux Systems Engineer, Edge Compute and Communications - Active Clearance Required

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Senior Linux Systems Engineer to support sensitive classified defense programs by building and maintaining tactical edge computing infrastructure for UAS products.

United States Full-time Senior Security Engineer Systems Engineer

$170k-$210k

Active Directory Bash Linux PowerShell

1 hour, 10 minutes ago

Apply

1 hour, 10 minutes ago

ML Tech Lead (GenAI, AWS)

Provectus 251-1K Professional Services

ML Tech Lead at an AI practice within the Engineering team, responsible for guiding the design and delivery of production GenAI and machine learning systems in a fully remote B2B setup.

Colombia Full-time Lead Machine Learning Engineer Technical Lead

AWS CI/CD Generative AI Git LLM Machine Learning MLOps PyTorch Scikit-learn TensorFlow

1 hour, 10 minutes ago

Apply

1 hour, 10 minutes ago

Relativity Space

Tags

Links

Staff GPU Systems Engineer, Space Computing

Relativity Space

Description

Requirements

Benefits

Similar Roles

Systems Engineer (AV/VTC)

AI Security - AI Platform Team Lead

Senior Linux Systems Engineer, Edge Compute and Communications - Active Clearance Required

ML Tech Lead (GenAI, AWS)

You're on a roll! Sign up now to keep applying.