Innodata

Innodata Inc. is a global leader in data engineering, offering end-to-end AI solutions and platforms for businesses worldwide, combining AI and human expertise to solve complex data challenges.

IT Services

Information Technology

1K-5K (4209)

Founded 1988

24 open positions

Links

View All Jobs

Applied Research Scientist, LLM Evaluation & Post-Training

1 month, 1 week ago

United States

Full-time

Senior

Research Scientist

Software Development

Hugging Face LLM Machine Learning Python PyTorch Statistics TensorFlow

Apply Now

Innodata

Innodata Inc. is a global leader in data engineering, offering end-to-end AI solutions and platforms for businesses worldwide, combining AI and human expertise to solve complex data challenges.

IT Services

1K-5K

Founded 1988

View All Jobs 24

Description

Define and execute a research agenda focused on LLM evaluation and evaluation-driven model improvement.
Design rigorous experiments to study how evaluation methods affect fine-tuning and post-training outcomes.
Develop and validate evaluation frameworks for LLM and multimodal systems, including benchmarks, scoring methods, judge-assisted evaluation, human protocols, and stress testing.
Lead research on advanced evaluation areas such as long-context, cross-modal, and dynamic multi-turn evaluation.
Analyze model behavior and failure patterns to identify actionable model improvement opportunities.
Compare existing evaluation techniques and propose improved methods with clear validity and scalability tradeoffs.
Collaborate with AI/ML Research Engineers to translate research methods into scalable pipelines.
Partner with Language Data Scientists to integrate human-in-the-loop and synthetic evaluation strategies.
Engage customer technical stakeholders to review methodology and provide expert recommendations.
Produce technical documentation, internal research reports, and client-facing materials explaining methods, results, assumptions, and limitations.

Requirements

MS or PhD in Computer Science, Machine Learning, Statistics, Applied Mathematics, AI, or a related quantitative scientific field; PhD strongly preferred.
5+ years of relevant applied research or research science experience in ML/AI, with substantial work in LLMs or foundation models.
Demonstrated experience with LLM evaluation, benchmarking, alignment, post-training, or model quality research.
Strong foundation in experimental design, statistical analysis, and scientific reasoning for ML systems.
Strong Python coding skills for research experimentation and analysis, including data processing, evaluation pipelines, statistical analysis, and visualization.
Experience with modern ML frameworks and tooling such as PyTorch, Hugging Face, and JAX/TensorFlow as applicable.
Ability to evaluate human and automated evaluation methods, including tradeoffs in cost, reliability, validity, and scalability.
Experience designing reproducible evaluation studies and protocols across datasets, model versions, and evaluation runs.
Ability to collaborate directly with research scientists, ML engineers, data scientists, and customer technical stakeholders.
Strong communication skills with the ability to present nuanced technical conclusions, assumptions, and limitations clearly.

Benefits

Competitive salary range of $175,000 to $225,000 USD per year, based on experience, skills, and qualifications.
Opportunity to work on cutting-edge GenAI research at a global data engineering company.
Work on research with direct impact on customer solutions and internal platform innovation.
Collaborative environment with researchers, engineers, and language/data operations teams.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Sensor Simulation Engineer, Radar

Parallel Domain 51-250 Aerospace & Defense

Parallel Domain is hiring a Senior Sensor Simulation Engineer to lead the development and validation of high-fidelity sensor models for autonomous system simulation, with a primary focus on radar and broader multi-modal sensing.

North America Full-time Senior Research Scientist

$155k-$175k

C++ Git Machine Learning Python Unreal Engine

53 minutes ago

Apply

53 minutes ago

Principal Research Scientist, Evidence & Strategy

Avalere Health Professional Services

Avalere Health is hiring a Principal Research Scientist for its Evidence & Strategy Practice to lead healthcare data analysis and evidence generation that informs client strategy across the pharmaceutical and broader healthcare industry.

United States Full-time Lead Research Scientist

$168k-$215k

SQL

1 hour, 8 minutes ago

Apply

1 hour, 8 minutes ago

Performance Engineer (C++, Python, Rust)

Weekday 11-50 Construction & Engineering

Performance Engineer needed for a remote AI research engagement with one of the client’s teams, focused on improving the quality of training and evaluation data for frontier large language models through systems and performance optimization work.

United States Contract Junior Research Scientist

$0k-$0k

C++ Machine Learning Python Rust

1 hour, 23 minutes ago

Apply

1 hour, 23 minutes ago

Senior Reverse Engineer - Remote

Zyte 251-1K Professional Services

Zyte is hiring a reverse engineering engineer to build automated tooling and AI-assisted pipelines for solving web antibot and fingerprinting challenges at scale.

Croatia Full-time Senior Research Scientist Software Engineer

Burp Suite JavaScript LLM Machine Learning Node.js Python Rust Wireshark

2 days, 1 hour ago

Apply

2 days, 1 hour ago

Innodata

Tags

Links

Applied Research Scientist, LLM Evaluation & Post-Training

Innodata

Description

Requirements

Benefits

Similar Roles

Senior Sensor Simulation Engineer, Radar

Principal Research Scientist, Evidence & Strategy

Performance Engineer (C++, Python, Rust)

Senior Reverse Engineer - Remote

You're on a roll! Sign up now to keep applying.