Staff Machine Learning Engineer, Offline Infrastructure

2 hours, 15 minutes ago
Full-time
Lead
Software Development
Unity

Unity

Unity is the top platform for real-time 3D content creation, empowering creators across industries to bring their ideas to life with interactive 2D and 3D content.

Internet Software & Services
5K-10K
Founded 2004

Description

  • Design and operate large-scale data pipelines that generate training datasets for machine learning training and experimentation.
  • Develop infrastructure that supports distributed training workflows using tools such as PyTorch, Ray Data, and Ray Train.
  • Integrate ML pipelines with workflow orchestration systems such as Flyte, Airflow, or similar platforms.
  • Improve reproducibility and observability through dataset validation, monitoring, and automated testing.
  • Optimize performance and resource utilization across distributed compute systems for data processing and model training.
  • Partner closely with ML engineers to support large-scale experimentation and model iteration.
  • Lead architectural improvements to keep offline ML pipelines scalable, reliable, and cost-efficient.

Requirements

  • Strong experience building large-scale ML pipelines.
  • Experience with distributed computing frameworks such as Ray, Spark, or Flink, including familiarity with the Ray ecosystem (Ray Data, Ray Train).
  • Experience building infrastructure for training data generation, dataset preparation, or ML feature pipelines.
  • Deep experience designing and operating production-grade data pipelines.
  • Strong programming skills in Python and experience with large-scale distributed workloads.
  • Experience with modern data infrastructure, including data lakes, data warehouses, orchestration systems, and streaming platforms.
  • Strong systems thinking with the ability to reason about performance, scalability, reliability, and cost tradeoffs in distributed systems.
  • Proven ability to lead technical direction and influence architectural decisions across teams without formal authority.
  • Strong knowledge of English for frequent professional verbal and written communication with global colleagues and partners.
  • Experience with PyTorch, Flyte, or Airflow is preferred.

Benefits

  • Gross pay salary of $209,700 to $283,800 USD.
  • Comprehensive health, life, and disability insurance.
  • Commute subsidy.
  • Employee stock ownership.
  • Competitive retirement or pension plans.
  • Generous vacation and personal days.
  • Support for new parents through leave and family-care programs.
  • Mental health and wellbeing programs and support.
  • Training and development programs.
  • Volunteering and donation matching program.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Full-Stack AI Engineer

Pavago IT Services

Full-Stack AI Engineer for a remote client, building and deploying production AI applications that connect machine learning models, back-end services, and user-facing interfaces into scalable business solutions.

Apache Airflow CI/CD Dagster Docker FastAPI Flask GCP HIPAA Hugging Face JavaScript Kubeflow Kubernetes Microservices MLflow Next.js Node.js Prefect Python PyTorch React SageMaker Serverless Snowflake SQL TensorFlow TypeScript Vertex AI Vue.js
2 hours, 49 minutes ago

Middle ML Operations

SPD Technology Internet Software & Services

PitchBook is seeking an experienced ML/AI engineer to help expand its data science and machine learning capabilities across the EU and Ukraine, supporting teams that build and deploy AI solutions for a global financial-data platform.

Agile Apache Airflow AWS Azure CI/CD Docker Elasticsearch FastAPI GCP GitOps Grafana Java Kafka Kubeflow Kubernetes LLM Machine Learning MLflow MongoDB Prometheus Python PyTorch Redis Scikit-learn SQL TensorFlow
3 hours, 11 minutes ago

Senior Machine Learning Engineer

C the Signs 51-250 Professional Services

C the Signs is hiring a Machine Learning Engineer to develop and deploy large language and machine learning models for healthcare use cases, with an emphasis on large-scale data preparation, model training, and production integration.

Apache Spark AWS Deep Learning Feature Engineering GCP LLM Machine Learning MLOps NumPy Pandas Python PyTorch Scikit-learn TensorFlow
3 hours, 23 minutes ago

Senior AI/ML Engineer (LLM, GenAI, and Agentic Systems)

Astro Sirens / Astro Sirens Staffing and Consulting IT services, staffing, and consulting

Astro Sirens is hiring a Senior AI/ML Engineer to design and deploy advanced AI solutions for U.S. company projects, with a focus on modern large language models, generative AI, and intelligent agent systems.

Apache Spark AWS Azure CI/CD Deep Learning Docker GCP Generative AI Hugging Face Kubernetes Machine Learning Microservices MLOps Python SQL
3 hours, 38 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers