Gramian Consultancy Group

Professional Services

Industrials

22 open positions

Links

View All Jobs

AI Evaluation Engineer (Knowledge & Research)

7 hours, 31 minutes ago

Egypt, Turkey, Brazil, Colombia, Bangladesh, India, Indonesia, Vietnam, Ghana, Kenya, Nigeria

Contract

Senior

AI (Artificial Intelligence)

Data Science and Analytics

Docker JSON Python

Apply Now

Description

Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collections.
Curate real-world research corpora, including academic papers, case studies, and technical reports, and design questions that require comprehensive analysis.
Write structured ground-truth oracles in JSON with specific, verifiable answers tied to the source material.
Design LLM judge prompts that evaluate agent output field by field against the oracle.
Create decomposition guides that split research across multiple parallel sub-agents and then synthesize the results.
Develop datasets and evaluation frameworks for benchmarking next-generation AI systems.
Translate research content into measurable evaluation tasks with high precision and clear scoring criteria.

Requirements

5+ years of experience in research, academic or industry, in a scientific, technical, or analytical domain.
Strong ability to read, analyze, and extract structured information from unstructured documents.
Experience designing or working with structured data formats such as JSON, schemas, and validation.
Proficiency in Python scripting for data processing, validation, or evaluation scripts.
Experience with AI evaluation, coding benchmarks, or structured reasoning tasks such as SWE-bench or Terminal-bench, or similar.
Experience working with Docker, including building images and debugging containers.
Strong attention to detail when defining exact, verifiable outputs.
Ability to design complex, multi-step problem-solving workflows.
High analytical thinking and structured problem decomposition skills.
Availability for 8 hours per day with 4 hours of overlap with PST.
Contractor assignment availability for 5 weeks+.
Location in one of the supported countries: Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria, Turkey, or Vietnam.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI Evaluation & Annotation Specialist (Entry-Mid Level) - Italian (Global)

Volga Partners 51-250 Internet Software & Services

AI Evaluation & Annotation Specialists at an AI-focused company will review, annotate, and assess LLM outputs to improve accuracy and consistency in production workflows.

Albania Romania Italy Contract Entry Level AI (Artificial Intelligence) Data Annotator

$21k-$29k

LLM Machine Learning

1 hour, 12 minutes ago

Apply

1 hour, 12 minutes ago

Senior Consultant (MBB & Top-Tier Firms) - Freelance AI Project

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift, powered by Toloka, is hiring experienced top-tier strategy consultants to help design realistic management consulting learning environments and evaluation frameworks for AI systems.

Canada Part-time Senior AI (Artificial Intelligence)

1 hour, 15 minutes ago

Apply

1 hour, 15 minutes ago

Optical Engineer - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking optical engineers and physics specialists for project-based AI work focused on testing, evaluating, and improving AI systems through the creation of original, research-style optics and physics problems.

United States Part-time Junior AI (Artificial Intelligence)

Up to $158k

2 hours ago

Apply

2 hours ago

Optical Engineer - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking optical and physics specialists for project-based AI work focused on creating and validating challenging physics problems for leading tech companies.

United States Part-time Junior AI (Artificial Intelligence)

Up to $158k

2 hours, 14 minutes ago

Apply

2 hours, 14 minutes ago

Gramian Consultancy Group

Tags

Links

AI Evaluation Engineer (Knowledge & Research)

Gramian Consultancy Group

Description

Requirements

Similar Roles

AI Evaluation & Annotation Specialist (Entry-Mid Level) - Italian (Global)

Senior Consultant (MBB & Top-Tier Firms) - Freelance AI Project

Optical Engineer - Freelance AI Trainer

Optical Engineer - Freelance AI Trainer

You're on a roll! Sign up now to keep applying.