Machine Learning Systems Engineer

3 weeks, 5 days ago
Full-time
Senior
DevOps and Infrastructure
Motional

Motional

Motional is a leading company in driverless technology and autonomous vehicles, leveraging decades of industry expertise to develop and deploy safe and reliable autonomous vehicles. With a powerful DNA combining Aptiv's automotive technology and Hyunda...

Automotive
1K-5K
Founded 2020
$20M raised

Description

  • Profile and optimize training performance across data loading, gradient computation, and communication bottlenecks.
  • Implement system-level optimizations such as kernel fusion, sharding, and tiling to reduce step time.
  • Optimize distributed training pipelines using PyTorch Distributed and related tooling.
  • Design and maintain high-performance GPU kernels in Triton or CUDA for machine learning workloads.
  • Improve and harden data loading pipelines to maximize training throughput.
  • Work at the intersection of machine learning research and high-performance systems engineering to support frontier model training at scale.

Requirements

  • Bachelor’s, Master’s degree, or PhD in Computer Science, Computer Engineering, or a related technical discipline.
  • Strong proficiency in Python.
  • Extensive hands-on experience with PyTorch.
  • Experience optimizing machine learning model execution during training and inference.
  • Strong understanding of fundamental machine learning concepts, architectures, and processes.
  • Exceptional analytical and problem-solving skills with a bias for action and a data-driven approach.
  • Experience with profiling tools such as Nsight and PyTorch Profiler.
  • Experience with Triton or CUDA for GPU kernel development (preferred).
  • Experience with distributed training frameworks such as PyTorch Distributed (preferred).

Benefits

  • Base salary range of $144,000 to $192,000 USD.
  • Potential additional compensation such as a bonus or company equity.
  • Medical, dental, and vision coverage.
  • 401(k) with company match.
  • Health savings accounts.
  • Life insurance.
  • Pet insurance.
  • Flexible hybrid schedule with in-office time in Boston, Pittsburgh, or Las Vegas, or the option to work fully remote.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Sr. Machine Learning Engineer

Mitek Systems 251-1K Communications Equipment

Mitek is hiring a remote Sr. Machine Learning Engineer to lead computer vision and image-based ML work for its identity verification and fraud prevention platform.

AWS CI/CD Computer Vision Docker DynamoDB Machine Learning Matplotlib MongoDB OpenCV Pandas Pillow Python PyTorch SageMaker Scikit-learn TensorFlow
2 hours, 45 minutes ago

Engineer II - Information Engineer

Alphasense 51-250 Industrial Conglomerates

AlphaSense is hiring an Engineer II, Information Engineering to design, build, and operate secure enterprise platforms and identity services that support internal teams and improve operational efficiency at scale.

AWS Bash CI/CD Cloudflare CrowdStrike Git JSON LLM OAuth PowerShell Python REST API SQL Terraform YAML
4 hours, 20 minutes ago

Staff Machine Learning Engineer, ML Infrastructure - Online

Unity 5K-10K Internet Software & Services

Unity Vector is seeking a senior/staff ML engineer to build and evolve its online model inference platform for serving production machine learning models at scale.

GCP Kubernetes Machine Learning Python PyTorch
4 hours, 53 minutes ago

Principal Architect, Infrastructure

Zencore Group 11-50 Internet Software & Services

Zencore is hiring a Principal Architect, Infrastructure US (Remote) to lead technical delivery and customer engagements for Google Cloud modernization projects in a fully remote professional services environment.

CI/CD DevSecOps GCP Kubernetes Serverless
5 hours, 29 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers