Research Engineer (Agentic Behavior – Kotlin AI Value Stream)

1 day, 18 hours ago
JetBrains

JetBrains

JetBrains provides cutting-edge development tools like IntelliJ IDEA and Kotlin, automating tasks to boost productivity and foster innovation.

Internet Software & Services
1K-5K
Founded 2000

Description

  • Design and implement tools to capture, classify, and analyze AI agent errors in Kotlin code generation.
  • Build observability pipelines over agent traces from JetBrains IDEs, Junie, Claude Code, Cursor, and other coding agents.
  • Design, implement, and maintain evaluation pipelines for Kotlin code generation quality across correctness, idiomaticity, build success, framework usage, and test coverage.
  • Build simulation environments that measure coding agents on realistic Kotlin developer tasks such as KMP projects, Gradle dependency management, and Java-to-Kotlin migrations.
  • Own evaluation infrastructure, including metrics, experiment tracking, automated regression checks, and reproducible benchmarking.
  • Experiment with post-training methods such as SFT, DPO, and GRPO to improve Kotlin-specific model behavior.
  • Investigate context engineering approaches including CLAUDE.md/AGENTS.md files, compiler-as-verifier feedback loops, Kotlin LSP integration, and MCP-based tooling.
  • Run A/B tests, benchmark suites, and before/after analyses to measure the impact of changes on real codebases.
  • Collaborate with model providers such as Anthropic, OpenAI, and Google to translate Kotlin-specific findings into model improvements.
  • Design and build open-source Kotlin benchmarks that become a reference for AI coding agent performance in the ecosystem.

Requirements

  • Hands-on experience building evaluation or analysis pipelines for LLMs or AI coding agents in research or production.
  • At least 3 years of strong Python engineering experience in data-heavy or ML-adjacent codebases.
  • Experience with data analysis at scale, including SQL or Athena, data pipelines, and statistical analysis of experimental results.
  • Ability to own projects end to end, from identifying failures to designing evals, running experiments, and shipping fixes.
  • Product-aware mindset focused on how developers actually use agents and how failure modes translate into evaluation and training work.
  • Familiarity with Kotlin or a strong willingness to develop deep Kotlin expertise.
  • Experience with post-training LLMs such as SFT, RLHF, DPO, or GRPO, or with designing the data and reward pipelines that support them.
  • Experience with modern deep learning frameworks and LLM training stacks such as PyTorch, TRL, verl, or Megatron.
  • Experience developing tool-using agents, multi-step coding workflows, or agentic frameworks.
  • Experience with evaluation frameworks or tools such as Inspect AI, Promptfoo, or LM-evaluation-harness.
  • Experience with experiment tracking and observability tools such as Weights & Biases, MLflow, or Langfuse.
  • Experience with the Kotlin ecosystem, including Android, Gradle, KMP, Spring, or Ktor, is preferred.
  • Contributing to or maintaining open-source projects, especially benchmarks or evaluation tools, is preferred.

Benefits

  • Competitive base salary that reflects skills and experience.
  • Flexible work location with the option to work from home or from the office.
  • Up to 30 days per year of remote work from abroad.
  • Extra time off to relax and recharge.
  • Medical insurance allowance for employees and their families.
  • Learning and development support, including conferences, courses, and language classes.
  • Relocation support for a smoother move.
  • Meals on workdays, either through a hot meal or lunch allowance.
  • Mental health support with access to professional services.
  • Sports benefits such as an on-site gym or sports club stipend.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Staff Applied Research Engineer, Biometrics

Veriff 51-250 IT Services

Veriff’s Biometric Solutions team is hiring a computer vision applied researcher to accelerate the development of biometric and liveness defenses that protect the automated identity verification funnel.

Bash Computer Vision Machine Learning OpenCV Python PyTorch SQL TensorFlow
1 day, 12 hours ago

Staff AI Engineer - Notebooks

Datadog 5K-10K IT Services

Datadog is hiring a Staff Software Engineer to lead AI/ML initiatives for its Notebooks product, building intelligent workflows that support data analysis, investigations, and technical documentation.

Machine Learning
1 day, 12 hours ago

Senior Staff Engineer, AI

AlphaSense 251-1K Internet Software & Services

AlphaSense is hiring a Senior Staff AI Engineer to lead the design and implementation of AI systems that extract insights from millions of unstructured documents and multimedia files across its content intelligence platform.

CI/CD Computer Vision Django Docker FastAPI Kubernetes LLM Machine Learning MLOps Spring Boot
1 day, 14 hours ago

Research Intern - Reinforcement Learning, Self-Driving

Applied Intuition 251-1K Internet Software & Services

Applied Intuition is hiring multiple Research Interns to help advance next-generation physical AI through research in autonomous driving and robotics.

Computer Vision Machine Learning Reinforcement Learning
1 day, 16 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers