Senior ML Engineer (Token Factory)

1 day, 20 hours ago
Nebius

Nebius

Nebius enables B2B companies to build local hyperscaling cloud platforms with cost-effective GPUs, InfiniBand network, and 50% less compute cost. They offer managed Kubernetes and a launch-ready business model for innovative cloud solutions.

Internet Software & Services
51-250

Description

  • Identify and remove LLM inference bottlenecks to improve production performance.
  • Optimize large language model workloads across a wide range of architectures at scale.
  • Implement speculative decoding and other inference engine enhancements.
  • Optimize components of dense and MoE model designs, including autoregressive and parallel approaches.
  • Contribute to open-source inference engines.
  • Design and productionize low-precision training and inference pipelines using formats such as FP8 and NVFP4/MXFP4.
  • Measure and improve throughput, latency, and cost per token across tens of thousands of GPUs.

Requirements

  • Profound understanding of machine learning theory and transformer architecture.
  • Experience profiling GPU workloads with Nsight, PyTorch Profiler, or similar tools.
  • Understanding of GPU memory hierarchy and compute/memory tradeoffs.
  • Familiarity with LLM concepts such as MHA, RoPE, KV-cache, Flash Attention, and quantization.
  • Understanding of performance aspects of large neural network training, including sharding strategies, custom kernels, and hardware features.
  • Strong software engineering skills, with primary experience in Python.
  • Deep experience with modern deep learning frameworks.
  • Proficiency with CI/CD, version control, and unit testing.
  • Strong communication and leadership abilities.
  • Experience with open-source inference engines such as vLLM, SGLang, or TensorRT-LLM is preferred.
  • Experience with kernel languages or DSLs such as Triton, Cute, CUTLASS, or CUDA is preferred.
  • Experience building and delivering products in a dynamic startup-like environment is preferred.
  • Experience developing large distributed systems or high-load web services is preferred.
  • Open-source projects that demonstrate engineering ability are preferred.
  • Excellent English writing, articulation, and communication skills are preferred.

Benefits

  • Competitive salary and comprehensive benefits package.
  • Opportunities for professional growth within Nebius.
  • Flexible working arrangements.
  • A dynamic, collaborative work environment that values initiative and innovation.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Machine Learning Engineer, AI Researcher

Cribl 251-1K IT Services

Cribl is hiring a remote-first machine learning engineer to help build AI-enabled security and observability products that solve real customer problems.

Computer Vision Feature Engineering Kubeflow Machine Learning MLflow MLOps NLP Python PyTorch Reinforcement Learning TensorFlow
1 day, 6 hours ago

Staff Machine Learning Engineer - Platform (Core AI Automation)

Coinbase 1K-5K Capital Markets

Coinbase is hiring a Machine Learning Engineer for its Core Automation Team to build AI infrastructure and automation that improve customer support, compliance operations, and AI-powered customer interactions on its onchain platform.

Apache Airflow Apache Spark Blockchain Computer Vision Databricks Deep Learning Flink Generative AI Kafka LLM Machine Learning NLP Python Snowflake
1 day, 6 hours ago

Software Engineer - ML Platform

Veriff 51-250 IT Services

Veriff’s ML Platform team is hiring a software or ML engineer to build the systems that support machine learning development, experimentation, observability, and scalable model deployment.

Apache Spark dbt Grafana Kubeflow MLflow MLOps Prometheus Python Snowflake SQL
1 day, 6 hours ago

Staff ML Engineer - ML Infrastructure

Samsara 1K-5K IT Services

Samsara is hiring a Staff / Senior Staff Machine Learning Infrastructure Engineer in Canada to lead the end-to-end ML platform for Safety AI and adjacent product areas that improve real-world operational safety.

Apache Spark AWS Computer Vision Embedded Systems IoT Kubernetes LLM Machine Learning
1 day, 7 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers