Nebius

Nebius

Nebius enables B2B companies to build local hyperscaling cloud platforms with cost-effective GPUs, InfiniBand network, and 50% less compute cost. They offer managed Kubernetes and a launch-ready business model for innovative cloud solutions.

Internet Software & Services
51-250

Description

  • Improve fine-tuning methods for large language models, including LoRA-based and full-parameter approaches.
  • Optimize inference performance by identifying bottlenecks and driving production speedups.
  • Build model training and evaluation pipelines in JAX for speculative decoding and related experiments.
  • Experiment with model architectures, including dense and MoE, auto-regressive and parallel approaches.
  • Derive scaling laws to guide resource allocation and system design.
  • Investigate low-precision training and inference methods such as FP8 and NVFP4/MXFP4.
  • Work on supervised fine-tuning and reinforcement learning workflows for modern hardware.
  • Collaborate on building a fast, reliable, and scalable platform for training and deploying foundation models.
  • Contribute to engineering and product development in a highly technical cloud infrastructure environment.

Requirements

  • Profound understanding of the theoretical foundations of machine learning and reinforcement learning.
  • Deep expertise in modern deep learning for language processing and generation.
  • Experience training large models across multiple computational nodes.
  • Understanding of performance considerations for large neural network training, including sharding strategies, custom kernels, and hardware features.
  • Strong software engineering skills, with Python used heavily.
  • Deep experience with modern deep learning frameworks, especially JAX.
  • Proficiency with CI/CD, version control, and unit testing.
  • Strong communication and leadership abilities.
  • Experience working with language models or similar NLP technologies is preferred.
  • Familiarity with LLM concepts such as MHA, RoPE, ZeRO/FSDP, Flash Attention, and quantization is preferred.
  • Track record of building and delivering products in a dynamic startup-like environment is preferred.
  • Experience developing large distributed systems or high-load web services is preferred.
  • Open-source projects that demonstrate engineering ability are preferred.
  • Excellent command of English, including strong writing and communication skills, is preferred.

Benefits

  • Competitive salary and comprehensive benefits package.
  • Opportunities for professional growth within Nebius.
  • Flexible working arrangements.
  • A dynamic and collaborative work environment that values initiative and innovation.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior ML Engineer (JetBrains Research)

JetBrains 1K-5K Internet Software & Services

JetBrains Research is hiring an ML Engineer to work on machine learning applications for software development, including code completion, AI agents, and test generation.

Java Kotlin Machine Learning NLP Python Statistics
1 minute ago

Senior Machine Learning Engineer II, Search & Recommendations Ranking

instacart.careers 1K-5K Internet Software & Services

Instacart is hiring a senior ML leader to architect and advance the search and personalization ranking platform that powers grocery discovery, recommendations, ads, and merchandising.

Deep Learning LLM Machine Learning Pandas Python PyTorch SQL TensorFlow XGBoost
16 minutes ago

VP, Customer Engineering (AI & Infrastructure)

Armada 201-500 information technology & services

Armada is seeking a VP of Customer Engineering to lead a globally distributed pre-sales technical organization supporting AI infrastructure and edge computing deployments across international markets.

AWS Azure GCP Kubernetes MLOps
16 minutes ago

Senior Machine Learning Engineer I

Capital Rx 251-1K Health Care Providers & Services

Judi Health is hiring a Senior Machine Learning Engineer to build and productionize AI/ML and GenAI systems for pharmacy benefits and healthcare workflows.

AWS Azure FastAPI Generative AI HIPAA LLM Machine Learning Microservices NLP Python SQL Twilio
46 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers