Staff Machine Learning Engineer, Offline Infrastructure

4 weeks, 1 day ago
Full-time
Lead
Software Development
Unity

Unity

Unity is the top platform for real-time 3D content creation, empowering creators across industries to bring their ideas to life with interactive 2D and 3D content.

Internet Software & Services
5K-10K
Founded 2004

Description

  • Design and operate large-scale data pipelines that generate training datasets for machine learning training and experimentation.
  • Develop infrastructure that supports distributed training workflows using tools such as PyTorch, Ray Data, and Ray Train.
  • Integrate ML pipelines with workflow orchestration systems such as Flyte, Airflow, or similar platforms.
  • Improve reproducibility and observability through dataset validation, monitoring, and automated testing.
  • Optimize performance and resource utilization across distributed compute systems for data processing and model training.
  • Partner closely with ML engineers to support large-scale experimentation and model iteration.
  • Lead architectural improvements to keep offline ML pipelines scalable, reliable, and cost-efficient.

Requirements

  • Strong experience building large-scale ML pipelines.
  • Experience with distributed computing frameworks such as Ray, Spark, or Flink, including familiarity with the Ray ecosystem (Ray Data, Ray Train).
  • Experience building infrastructure for training data generation, dataset preparation, or ML feature pipelines.
  • Deep experience designing and operating production-grade data pipelines.
  • Strong programming skills in Python and experience with large-scale distributed workloads.
  • Experience with modern data infrastructure, including data lakes, data warehouses, orchestration systems, and streaming platforms.
  • Strong systems thinking with the ability to reason about performance, scalability, reliability, and cost tradeoffs in distributed systems.
  • Proven ability to lead technical direction and influence architectural decisions across teams without formal authority.
  • Strong knowledge of English for frequent professional verbal and written communication with global colleagues and partners.
  • Experience with PyTorch, Flyte, or Airflow is preferred.

Benefits

  • Gross pay salary of $209,700 to $283,800 USD.
  • Comprehensive health, life, and disability insurance.
  • Commute subsidy.
  • Employee stock ownership.
  • Competitive retirement or pension plans.
  • Generous vacation and personal days.
  • Support for new parents through leave and family-care programs.
  • Mental health and wellbeing programs and support.
  • Training and development programs.
  • Volunteering and donation matching program.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI Learning Platform & Systems Lead (Fixed-Term Contract - Remote)

Pickle Professional Services

FYXER is hiring a freelance AI Learning Platform & Systems Lead to support the delivery, scalability, reliability, and optimisation of AI-enabled learning environments and platform ecosystems.

47 minutes ago

Director, Prediction and ML Planning

Motional 1K-5K Automotive

Motional is hiring a Director of Behaviors to lead its machine learning-based Prediction and Planning teams for autonomous vehicles, driving the development of a unified behavior stack that supports joint prediction and planning.

LLM Machine Learning Reinforcement Learning
2 hours, 33 minutes ago

AI/ML Engineer

66degrees 251-1K IT Services

66degrees is hiring a Data Scientist/AI-ML Engineer to analyze complex client data and deliver AI-driven solutions that improve business decisions and outcomes.

Deep Learning Docker Feature Engineering GCP Generative AI Git Keras Kubernetes LLM Looker Machine Learning MLOps Python PyTorch Reinforcement Learning Scikit-learn Shell Scripting SQL Statistics TensorFlow Vertex AI
3 hours, 2 minutes ago

Senior ML Engineer (LLMs, AWS)

Provectus 251-1K Professional Services

Provectus is hiring a Senior ML Engineer to build and improve production machine learning systems, with a focus on LLM-based solutions, AWS, and AI applications across its engineering practice.

Apache Spark AWS Deep Learning Docker Feature Engineering LLM Machine Learning NLP Python
4 hours, 24 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers