AI Evaluation Engineer (Agentic Coding / Software Engineering)

4 hours, 42 minutes ago

Description

  • Execute coding tasks within agentic coding environments while following strict evaluation protocols.
  • Review and evaluate model-generated code trajectories for correctness and completeness.
  • Validate outputs by reading code, running tests, analyzing logs, and inspecting artifacts.
  • Perform targeted validation using scripts, tests, and manual checks.
  • Write clear, evidence-based rationales for evaluations and rankings.
  • Design realistic, multi-step coding tasks and workflows for offline work.
  • Create and refine evaluation rubrics and scoring criteria.
  • Ensure consistency, quality, and compliance across evaluations.
  • Identify issues in environments, instructions, or workflows and report them with clear evidence.

Requirements

  • 5+ years of experience in software engineering, QA, developer tooling, or similar code-heavy roles.
  • Strong proficiency in at least one programming ecosystem, such as Python, JavaScript/TypeScript, Java, C/C++, Rust, or SQL.
  • Ability to read and understand unfamiliar codebases and implement or debug changes.
  • Experience running and interpreting tests, scripts, and CLI tools.
  • Strong debugging and problem-solving skills, including handling edge cases.
  • Comfortable working in Linux and terminal environments.
  • Familiarity with Git workflows and standard development tooling.
  • Experience with AI coding tools or agentic coding environments such as Cursor, Claude Code, or similar.
  • Strong attention to detail and ability to produce consistent, high-quality evaluations.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI/ML Data Contributor

TSMG Professional Services

AI/ML Data Contributor role with a company supporting active and upcoming machine learning projects across the United States, focused on task-based data and testing work in remote and occasional on-site settings.

Machine Learning
17 minutes ago

Synthetic Data Engineer (AI Data/Training)

Hyphen Connect 1-10 staffing & recruiting

Synthetic Data Engineer at an organization building domain-specific synthetic data generation pipelines and data workflows that support model training.

Apache Airflow Apache Spark
48 minutes ago

Freelance Annotator (English) - AI Trainer

Toloka 251-1K Internet Software & Services

Toloka is seeking freelance AI annotators to support project-based online tasks that help train and improve generative AI through data review, labeling, and evaluation.

Generative AI
52 minutes ago

AI/ML Data Contributor

TSMG Professional Services

AI/ML Data Contributor is a remote, task-based contract role with a U.S.-based company supporting machine learning projects through data collection, evaluation, and testing.

Machine Learning
1 hour, 7 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers