Applied Research Scientist, LLM Evaluation & Post-Training

6 hours, 6 minutes ago
Full-time
Senior
Software Development
Innodata

Innodata

Innodata Inc. is a global leader in data engineering, offering end-to-end AI solutions and platforms for businesses worldwide, combining AI and human expertise to solve complex data challenges.

IT Services
1K-5K
Founded 1988

Description

  • Define and execute a research agenda focused on LLM evaluation and evaluation-driven model improvement.
  • Design rigorous experiments to study how evaluation methods affect fine-tuning and post-training outcomes.
  • Develop and validate evaluation frameworks for LLM and multimodal systems, including benchmarks, scoring methods, judge-assisted evaluation, human protocols, and stress testing.
  • Lead research on advanced evaluation areas such as long-context, cross-modal, and dynamic multi-turn evaluation.
  • Analyze model behavior and failure patterns to identify actionable model improvement opportunities.
  • Compare existing evaluation techniques and propose improved methods with clear validity and scalability tradeoffs.
  • Collaborate with AI/ML Research Engineers to translate research methods into scalable pipelines.
  • Partner with Language Data Scientists to integrate human-in-the-loop and synthetic evaluation strategies.
  • Engage customer technical stakeholders to review methodology and provide expert recommendations.
  • Produce technical documentation, internal research reports, and client-facing materials explaining methods, results, assumptions, and limitations.

Requirements

  • MS or PhD in Computer Science, Machine Learning, Statistics, Applied Mathematics, AI, or a related quantitative scientific field; PhD strongly preferred.
  • 5+ years of relevant applied research or research science experience in ML/AI, with substantial work in LLMs or foundation models.
  • Demonstrated experience with LLM evaluation, benchmarking, alignment, post-training, or model quality research.
  • Strong foundation in experimental design, statistical analysis, and scientific reasoning for ML systems.
  • Strong Python coding skills for research experimentation and analysis, including data processing, evaluation pipelines, statistical analysis, and visualization.
  • Experience with modern ML frameworks and tooling such as PyTorch, Hugging Face, and JAX/TensorFlow as applicable.
  • Ability to evaluate human and automated evaluation methods, including tradeoffs in cost, reliability, validity, and scalability.
  • Experience designing reproducible evaluation studies and protocols across datasets, model versions, and evaluation runs.
  • Ability to collaborate directly with research scientists, ML engineers, data scientists, and customer technical stakeholders.
  • Strong communication skills with the ability to present nuanced technical conclusions, assumptions, and limitations clearly.

Benefits

  • Competitive salary range of $175,000 to $225,000 USD per year, based on experience, skills, and qualifications.
  • Opportunity to work on cutting-edge GenAI research at a global data engineering company.
  • Work on research with direct impact on customer solutions and internal platform innovation.
  • Collaborative environment with researchers, engineers, and language/data operations teams.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Lead Signal Processing Researcher

STR 251-1K Aerospace & Defense

STR’s Sensors Division is hiring a Lead Signal Processing Researcher to lead development and integration of advanced signal processing, optimization, and machine learning solutions for electronic warfare and sensor systems supporting national security missions.

C C++ Machine Learning MATLAB Python
1 day, 5 hours ago

Senior Staff Engineer, Software

Alphasense 51-250 Industrial Conglomerates

AlphaSense is hiring a Senior Staff Engineer to lead the design of AI-driven content processing systems that extract insights from millions of unstructured documents and multimedia files at scale.

CI/CD Computer Vision Django Docker FastAPI Kubernetes LLM MLOps Spring Boot
1 day, 6 hours ago

Lead Signal Processing Researcher

STR 251-1K Aerospace & Defense

STR is seeking a Lead Signal Processing Researcher to lead the development, testing, integration, and demonstration of signal processing, optimization, and machine learning algorithms for advanced electronic warfare and sensor systems supporting national security missions.

Machine Learning MATLAB Python SAP
2 days, 6 hours ago

Lead Signal Processing Researcher

STR 251-1K Aerospace & Defense

STR’s Sensors Division is hiring a Lead Signal Processing Researcher to lead the development, integration, and demonstration of signal processing, optimization, and machine learning solutions for advanced electronic warfare and sensing systems.

Machine Learning MATLAB Python
3 days, 5 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers