Machine Learning Systems Engineer

2 hours, 33 minutes ago
Full-time
Senior
DevOps and Infrastructure
Motional

Motional

Motional is a leading company in driverless technology and autonomous vehicles, leveraging decades of industry expertise to develop and deploy safe and reliable autonomous vehicles. With a powerful DNA combining Aptiv's automotive technology and Hyunda...

Automotive
1K-5K
Founded 2020
$20M raised

Description

  • Profile and optimize training performance across data loading, gradient computation, and communication bottlenecks.
  • Implement system-level optimizations such as kernel fusion, sharding, and tiling to reduce step time.
  • Optimize distributed training pipelines using PyTorch Distributed and related tooling.
  • Design and maintain high-performance GPU kernels in Triton or CUDA for machine learning workloads.
  • Improve and harden data loading pipelines to maximize training throughput.
  • Work at the intersection of machine learning research and high-performance systems engineering to support frontier model training at scale.

Requirements

  • Bachelor’s, Master’s degree, or PhD in Computer Science, Computer Engineering, or a related technical discipline.
  • Strong proficiency in Python.
  • Extensive hands-on experience with PyTorch.
  • Experience optimizing machine learning model execution during training and inference.
  • Strong understanding of fundamental machine learning concepts, architectures, and processes.
  • Exceptional analytical and problem-solving skills with a bias for action and a data-driven approach.
  • Experience with profiling tools such as Nsight and PyTorch Profiler.
  • Experience with Triton or CUDA for GPU kernel development (preferred).
  • Experience with distributed training frameworks such as PyTorch Distributed (preferred).

Benefits

  • Base salary range of $144,000 to $192,000 USD.
  • Potential additional compensation such as a bonus or company equity.
  • Medical, dental, and vision coverage.
  • 401(k) with company match.
  • Health savings accounts.
  • Life insurance.
  • Pet insurance.
  • Flexible hybrid schedule with in-office time in Boston, Pittsburgh, or Las Vegas, or the option to work fully remote.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Machine Learning Engineer

CloudWalk 51-250 Diversified Financial Services

CloudWalk is hiring a Machine Learning Engineer to develop real-time, edge-deployed security models that detect and block attacks across high-volume network traffic for its cybersecurity and risk & compliance environment.

AWS CDN Cloudflare GCP HTTP LLM Machine Learning Python PyTorch Rust Scikit-learn SQL TensorFlow TypeScript WAF
2 hours, 13 minutes ago

Multinational Digital Infrastructure - Full Stack SW Eng. (US)

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Full Stack Systems and Software Engineer to build and integrate mission-critical digital infrastructure for Maritime and multinational classified environments supporting global defense operations.

Angular AWS Azure Computer Vision Docker GCP Go Kubernetes Microservices Node.js Python React REST API Vue.js
3 hours, 1 minute ago

Azure Cloud Transformation Lead

Distro 11-50 Internet Software & Services

Azure Cloud Transformation Lead at Solvo Global will drive a hands-on, company-wide migration and modernization of infrastructure into Azure while strengthening security, governance, automation, and internal team enablement.

Ansible AWS Azure CI/CD DigitalOcean Docker GCP GitLab Kubernetes Linux Terraform WAF
3 hours, 28 minutes ago

Licensed Civil Engineer - Data Center

Olsson 1K-5K Construction & Engineering

Olsson is hiring a Licensed Civil Engineer to support its Data Center Civil team on large hyperscale and colocation data center projects across the U.S., with a focus on designing critical infrastructure for complex engineering-driven developments.

5 hours, 24 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers