Machine Learning Systems Engineer

3 weeks, 1 day ago
Full-time
Senior
DevOps and Infrastructure
Motional

Motional

Motional is a leading company in driverless technology and autonomous vehicles, leveraging decades of industry expertise to develop and deploy safe and reliable autonomous vehicles. With a powerful DNA combining Aptiv's automotive technology and Hyunda...

Automotive
1K-5K
Founded 2020
$20M raised

Description

  • Profile and optimize training performance by identifying bottlenecks in data loading, gradient computation, and communication.
  • Implement training optimizations such as kernel fusion, sharding, and tiling to reduce step time.
  • Optimize distributed training pipelines using PyTorch Distributed and related tooling.
  • Design and maintain high-performance GPU kernels in Triton or CUDA for ML workloads.
  • Improve data loading pipelines to maximize training throughput.
  • Work at the intersection of machine learning research and high-performance systems engineering to improve speed, cost, reliability, and throughput.
  • Help scale large distributed model training and reduce time to convergence for next-generation models.

Requirements

  • Bachelor’s, Master’s degree, or PhD in Computer Science, Computer Engineering, or a related technical discipline.
  • Strong proficiency in Python.
  • Extensive hands-on experience with PyTorch.
  • Experience optimizing machine learning model execution during training and inference.
  • Strong understanding of fundamental machine learning concepts, architectures, and processes.
  • Exceptional analytical and problem-solving skills.
  • Bias for action and a data-driven approach to technical challenges.
  • Experience with profiling tools such as Nsight and PyTorch Profiler is preferred.
  • Experience with Triton or CUDA is preferred.
  • Experience with distributed training frameworks such as PyTorch Distributed is preferred.

Benefits

  • Base salary range of $144,000 to $192,000 USD.
  • Additional compensation may include a bonus or company equity.
  • Medical, dental, and vision coverage.
  • 401(k) with company match.
  • Health savings accounts.
  • Life insurance.
  • Pet insurance.
  • Hybrid schedule with in-office time in Boston, Pittsburgh, or Las Vegas, or fully remote work available.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Software Engineer (Backend) - AI/ML

ClickHouse 51-250 IT Services

ClickHouse is hiring a software engineer for its AI/ML Engineering team to design, build, and operate AI products and integrations that power the company’s cloud analytics platform.

AWS Azure ClickHouse GCP Go Python TypeScript
1 hour, 24 minutes ago

Manager, Vehicle Software Infrastructure

Relativity Space 251-1K Aerospace & Defense

Relativity Space is hiring a Vehicle Software Infrastructure manager to lead the tools, processes, and engineering support that enable safety-critical software development for the Terran R rocket program.

1 hour, 35 minutes ago

Senior Software Engineer - Platform & MLOps

Serko 251-1K Consumer Services

Serko is hiring 2 Senior Full Stack Engineers to build the internal platform and tooling that support its AI engineering teams in creating and operating next-generation travel technology products.

AWS Azure Datadog Docker GCP Grafana Kubernetes Machine Learning Next.js Prometheus Python React REST API TypeScript
1 hour, 42 minutes ago

Senior AI Platform Engineer

Wellhub 1-10 Gas Utilities

Wellhub is hiring a Senior AI Platform Engineer in Brazil to help build and evolve the cloud-native ML development platform that enables engineers and data scientists to develop and deploy AI at scale.

Apache Spark AWS CI/CD Kubeflow Kubernetes MLOps Python Terraform
2 hours, 47 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers