JetBrains

JetBrains

JetBrains provides cutting-edge development tools like IntelliJ IDEA and Kotlin, automating tasks to boost productivity and foster innovation.

Internet Software & Services
1K-5K
Founded 2000

Description

  • Design, implement, and maintain SFT and RL post-training pipelines for multi-step coding agents.
  • Train and adapt LLMs for agent workflows, including planning, tool use, and multi-step interactions inside JetBrains IDEs.
  • Build evaluation and simulation environments where coding agents can perform and be measured on realistic developer tasks.
  • Design evaluation frameworks and metrics for agent behavior, analyze traces and logs, and feed results back into training, data, and reward design.
  • Analyze training and evaluation results to improve model architectures, training recipes, and datasets.
  • Work with large-scale infrastructure for distributed GPU training and MapReduce-style data processing for pre-training and fine-tuning datasets.
  • Collaborate with research, product, and infrastructure teams to turn product visions into models, experiments, and shipped features.

Requirements

  • Extensive hands-on experience training LLMs in pre-training, fine-tuning, or post-training settings in research or production.
  • Deep expertise in PyTorch and specialized LLM training stacks such as Megatron, NeMo, verl, or similar.
  • Strong understanding of LLM fundamentals, including architectures, tokenization, data pipelines, batching, mixed precision, distributed training, and debugging unstable runs.
  • Ability to own projects end to end from problem definition through design, experimentation, implementation, and iteration.
  • Product-aware mindset with the ability to translate product needs and failure modes into modeling and evaluation work.
  • At least 3 years of Python experience writing clean, maintainable code in modern ML codebases.
  • Experience with ML orchestrators and workflow tools such as Kubeflow, Dagster, Airflow, ZenML, Kubernetes, or SLURM.
  • Experience with large-scale data and training pipelines, including MapReduce-style clusters, multi-node GPU training, or 1M+ CPU/GPU hour workloads.
  • Experience designing and maintaining evaluation pipelines for LLMs or agents, including metrics, dashboards, experiment tracking, and automated regression checks.
  • Experience with AI agent development, such as tool-using agents, planners, or multi-step coding workflows, and familiarity with agentic frameworks or patterns.
  • Experience with tools such as Weights & Biases, MLflow, Langfuse, or similar for experiment tracking and observability.
  • Experience with inference optimization and serving optimized models in production.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Applied Research Engineer, Biometrics

Veriff 51-250 IT Services

Veriff’s Biometric Solutions team is hiring a computer vision applied researcher to accelerate the development of biometric and liveness defenses that protect the automated identity verification funnel.

Bash Computer Vision Machine Learning OpenCV Python PyTorch SQL TensorFlow
1 day, 12 hours ago

Staff AI Engineer - Notebooks

Datadog 5K-10K IT Services

Datadog is hiring a Staff Software Engineer to lead AI/ML initiatives for its Notebooks product, building intelligent workflows that support data analysis, investigations, and technical documentation.

Machine Learning
1 day, 12 hours ago

Senior Staff Engineer, AI

AlphaSense 251-1K Internet Software & Services

AlphaSense is hiring a Senior Staff AI Engineer to lead the design and implementation of AI systems that extract insights from millions of unstructured documents and multimedia files across its content intelligence platform.

CI/CD Computer Vision Django Docker FastAPI Kubernetes LLM Machine Learning MLOps Spring Boot
1 day, 14 hours ago

Research Intern - Reinforcement Learning, Self-Driving

Applied Intuition 251-1K Internet Software & Services

Applied Intuition is hiring multiple Research Interns to help advance next-generation physical AI through research in autonomous driving and robotics.

Computer Vision Machine Learning Reinforcement Learning
1 day, 16 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers