Gramian Consultancy Group

Professional Services

Industrials

23 open positions

Links

View All Jobs

AI Evaluation Engineer (Software Engineering / Code)

4 hours, 42 minutes ago

Vietnam, Brazil, Colombia, Egypt, Turkey, Bangladesh, India, Indonesia, Ghana, Kenya, Nigeria

Contract

Senior

AI (Artificial Intelligence)

Software Development

CI/CD Django Docker FastAPI Flask Git JavaScript Node.js Pytest Python Unittest

Apply Now

Description

Design and build multi-agent benchmark tasks based on real-world code changes such as bug fixes, migrations, and refactors.
Work with the Harbor evaluation framework to run and validate tasks in containerized environments.
Write clear and precise task instructions, including file paths, function signatures, expected behavior, and constraints.
Develop Python-based verification scripts to validate the correctness of code changes.
Define task decomposition strategies across multiple specialized agents.
Analyze and navigate large open-source codebases to extract realistic task scenarios.
Run, debug, and refine tasks in Docker environments to ensure reproducibility.
Improve task quality, clarity, and difficulty based on evaluation results.

Requirements

5+ years of experience in software development, with Python and JavaScript.
Strong experience working with large codebases such as Django, Flask, FastAPI, Node.js, or similar.
Familiarity with Git workflows, including pull requests, diffs, commits, and cherry-picking.
Experience writing tests or validation scripts using pytest, unittest, or similar tools.
Ability to write clear and precise technical specifications.
Familiarity with AI coding benchmarks or evaluation frameworks such as SWE-bench or similar.
Hands-on experience with Docker, including Dockerfiles, image builds, and debugging.
Experience contributing to or maintaining open-source projects is preferred.
Experience with code migrations or large-scale refactoring is preferred.
Familiarity with CI/CD pipelines and automated testing workflows is preferred.
Exposure to LLM-based coding tools or evaluation frameworks is preferred.
Availability for 8 hours per day with 4 hours of overlap with PST.
Ability to work as a contractor for a 4+ week assignment.
Location in Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria, Turkey, or Vietnam.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI/ML Data Contributor role with a company supporting active and upcoming machine learning projects across the United States, focused on task-based data and testing work in remote and occasional on-site settings.

United States Full-time Entry Level AI (Artificial Intelligence)

Machine Learning

17 minutes ago

Apply

17 minutes ago

Synthetic Data Engineer (AI Data/Training)

Hyphen Connect 1-10 staffing & recruiting

Synthetic Data Engineer at an organization building domain-specific synthetic data generation pipelines and data workflows that support model training.

United States Mid Level AI (Artificial Intelligence) Data Engineer

Apache Airflow Apache Spark

48 minutes ago

Apply

48 minutes ago

Freelance Annotator (English) - AI Trainer

Toloka 251-1K Internet Software & Services

Toloka is seeking freelance AI annotators to support project-based online tasks that help train and improve generative AI through data review, labeling, and evaluation.

India Part-time Junior AI (Artificial Intelligence) Data Annotator

Up to $0k

Generative AI

52 minutes ago

Apply

52 minutes ago

AI/ML Data Contributor

TSMG Professional Services

AI/ML Data Contributor is a remote, task-based contract role with a U.S.-based company supporting machine learning projects through data collection, evaluation, and testing.

United States Full-time Entry Level AI (Artificial Intelligence)

Machine Learning

1 hour, 7 minutes ago

Apply

1 hour, 7 minutes ago

Gramian Consultancy Group

Tags

Links

AI Evaluation Engineer (Software Engineering / Code)

Gramian Consultancy Group

Description

Requirements

Similar Roles

AI/ML Data Contributor

Synthetic Data Engineer (AI Data/Training)

Freelance Annotator (English) - AI Trainer

AI/ML Data Contributor

You're on a roll! Sign up now to keep applying.