Machine Learning Systems Engineer

3 weeks, 1 day ago
Full-time
Mid Level
DevOps and Infrastructure
Motional

Motional

Motional is a leading company in driverless technology and autonomous vehicles, leveraging decades of industry expertise to develop and deploy safe and reliable autonomous vehicles. With a powerful DNA combining Aptiv's automotive technology and Hyunda...

Automotive
1K-5K
Founded 2020
$20M raised

Description

  • Profile and optimize training performance by identifying bottlenecks in data loading, gradient computation, and communication.
  • Implement system-level optimizations such as kernel fusion, sharding, and tiling to improve step time.
  • Optimize distributed training pipelines using frameworks such as PyTorch Distributed.
  • Design and maintain high-performance GPU kernels in Triton or CUDA for machine learning workloads.
  • Engineer robust data loading pipelines that maximize training throughput.
  • Work at the intersection of machine learning research and high-performance systems engineering to support large-scale distributed training.
  • Improve core infrastructure that enables researchers to train frontier models at scale.

Requirements

  • Bachelor’s, Master’s degree, or PhD in Computer Science, Computer Engineering, or a related technical discipline.
  • Strong proficiency in Python.
  • Extensive hands-on experience with PyTorch.
  • Experience optimizing machine learning model execution during training and inference.
  • Strong understanding of fundamental machine learning concepts, architectures, and processes.
  • Exceptional analytical and problem-solving skills with a bias for action and a data-driven approach.
  • Experience with profiling tools such as Nsight or PyTorch Profiler (preferred).
  • Experience with Triton or CUDA for GPU kernel development (preferred).
  • Experience with distributed training frameworks such as PyTorch Distributed (preferred).

Benefits

  • Hybrid schedule with in-office time in Boston, Pittsburgh, or Las Vegas, or the option to work fully remote.
  • Base salary range of $144,000 to $192,000 USD.
  • Additional compensation may include a bonus or company equity.
  • Medical, dental, and vision insurance.
  • 401(k) with company match.
  • Health savings accounts.
  • Life insurance and pet insurance.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff MLOps Engineer

Apptronik 51-250 Aerospace & Defense

Apptronik is hiring a Staff MLOps Engineer to define and own the platform that moves data, experiments, and trained models from teleoperation through deployment on its Apollo humanoid robots.

AWS Azure C++ CI/CD Docker Embedded Systems GCP Git Go Kubernetes MLflow MLOps Python Reinforcement Learning Rust
31 minutes ago

Senior Software Engineer (Backend) - AI/ML

ClickHouse 51-250 IT Services

ClickHouse is hiring a software engineer for its AI/ML Engineering team to design, build, and operate AI products and integrations that power the company’s cloud analytics platform.

AWS Azure ClickHouse GCP Go Python TypeScript
2 hours, 16 minutes ago

Senior Software Engineer - Platform & MLOps

Serko 251-1K Consumer Services

Serko is hiring 2 Senior Full Stack Engineers to build the internal platform and tooling that support its AI engineering teams in creating and operating next-generation travel technology products.

AWS Azure Datadog Docker GCP Grafana Kubernetes Machine Learning Next.js Prometheus Python React REST API TypeScript
2 hours, 34 minutes ago

Senior AI Platform Engineer

Wellhub 1-10 Gas Utilities

Wellhub is hiring a Senior AI Platform Engineer in Brazil to help build and evolve the cloud-native ML development platform that enables engineers and data scientists to develop and deploy AI at scale.

Apache Spark AWS CI/CD Kubeflow Kubernetes MLOps Python Terraform
3 hours, 39 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers