Machine Learning Systems Engineer

1 month, 1 week ago
Full-time
Senior
DevOps and Infrastructure
Motional

Motional

Motional is a leading company in driverless technology and autonomous vehicles, leveraging decades of industry expertise to develop and deploy safe and reliable autonomous vehicles. With a powerful DNA combining Aptiv's automotive technology and Hyunda...

Automotive
1K-5K
Founded 2020
$20M raised

Description

  • Profile and optimize training performance by identifying bottlenecks in data loading, gradient computation, and communication.
  • Implement training optimizations such as kernel fusion, sharding, and tiling to reduce step time.
  • Optimize distributed training pipelines using PyTorch Distributed and related tooling.
  • Design and maintain high-performance GPU kernels in Triton or CUDA for ML workloads.
  • Improve data loading pipelines to maximize training throughput.
  • Work at the intersection of machine learning research and high-performance systems engineering to improve speed, cost, reliability, and throughput.
  • Help scale large distributed model training and reduce time to convergence for next-generation models.

Requirements

  • Bachelor’s, Master’s degree, or PhD in Computer Science, Computer Engineering, or a related technical discipline.
  • Strong proficiency in Python.
  • Extensive hands-on experience with PyTorch.
  • Experience optimizing machine learning model execution during training and inference.
  • Strong understanding of fundamental machine learning concepts, architectures, and processes.
  • Exceptional analytical and problem-solving skills.
  • Bias for action and a data-driven approach to technical challenges.
  • Experience with profiling tools such as Nsight and PyTorch Profiler is preferred.
  • Experience with Triton or CUDA is preferred.
  • Experience with distributed training frameworks such as PyTorch Distributed is preferred.

Benefits

  • Base salary range of $144,000 to $192,000 USD.
  • Additional compensation may include a bonus or company equity.
  • Medical, dental, and vision coverage.
  • 401(k) with company match.
  • Health savings accounts.
  • Life insurance.
  • Pet insurance.
  • Hybrid schedule with in-office time in Boston, Pittsburgh, or Las Vegas, or fully remote work available.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Public Key Infrastructure (PKI) Architect

Capital Technology Group 51-250 Internet Software & Services

Capital Technology Group is seeking a PKI Architect to modernize and support enterprise public key infrastructure and identity trust services for mission-critical federal systems.

Ansible AWS Azure CI/CD Cybersecurity DevSecOps Docker HashiCorp Vault Kubernetes SonarQube Splunk
7 hours, 13 minutes ago

Senior/Staff Machine Learning Engineer, Data Infrastructure

Unity 5K-10K Internet Software & Services

Unity Vector is hiring a senior data infrastructure engineer to build and evolve the offline data platform that powers machine learning training, experimentation, and large-scale analytics.

Apache Airflow Apache Spark Machine Learning Python
7 hours, 13 minutes ago

Especialista de Dados/IA

iFood 5K-10K Air Freight & Logistics

O iFood está contratando para atuar no desenvolvimento de agentes de IA e soluções de dados aplicadas à monitorização e detecção de fraudes no ecossistema de pagamentos da empresa.

Agile Apache Airflow Apache Spark AWS Git GPT LLM MLOps Python SQL
7 hours, 13 minutes ago

Cloud Platform Architect (AWS/Azure & Cycloid)

European Dynamics 251-1K IT Services

European Dynamics is hiring a remote Cloud Platform Architect to support a major European Institution’s development team in designing and operating cloud platform solutions on AWS, Azure, and Cycloid.

AWS Azure Java PostgreSQL Prototyping
7 hours, 13 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers