Freelance Agent Evaluation Engineer

4 hours, 17 minutes ago
Contract
Senior
Artificial Intelligence and Machine Learning
Mindrift.ai: Be the “I” in AI

Mindrift.ai: Be the “I” in AI

Join 10,000+ experts earning $15-50/hr training AI models remotely. Flexible freelance work, weekly payments. No AI experience required. Apply in 5 minutes.

Internet Software & Services

Description

  • Build realistic virtual developer environments with codebases, infrastructure, and supporting context such as tickets, documentation, and conversations.
  • Design coding tasks from intermediate states of those environments and define clear success criteria for each task.
  • Ensure each task is solvable by an AI agent while remaining challenging for advanced models.
  • Write functional and integration tests that validate correct solutions and reject incorrect ones.
  • Review agent outputs, analyze failure cases, and refine tasks and tests based on QA feedback.
  • Iterate on evaluation materials until the task design is fair, robust, and reliable.
  • Evaluate multiple valid solution approaches and avoid overly strict or overly lenient test coverage.
  • Work within a project-based workflow to complete and submit tasks by the required deadline.

Requirements

  • 5+ years of experience in software development.
  • Experience with the core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
  • Experience writing functional and integration tests.
  • English proficiency at B2 level or higher.
  • CV must be submitted in English.
  • Ability to work on a project-based engagement rather than permanent employment.
  • Ability to complete tasks estimated at around 20 hours each.
  • Comfort working in realistic development environments and assessing AI-generated solutions.

Benefits

  • Up to $50/hr equivalent compensation, depending on level and pace.
  • Flexible schedule with self-directed working hours.
  • Project-based work with the ability to choose when and how to work.
  • Tasks estimated at approximately 20 hours each, allowing for concentrated short-term engagements.
  • Payment upon successful task completion under the project workflow.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior QE Lead, Infrastructure

Airbnb 5K-10K Hotels, Restaurants & Leisure

Airbnb is seeking a Senior QE Lead to drive software quality across cross-functional product and engineering work for its globally distributed Infrastructure Quality team.

Agile Asana JIRA Scrum
4 hours, 2 minutes ago

Especialista en entrenamiento de IA (video en primera persona)

Toloka 251-1K Internet Software & Services

Remote project-based content capture role for a company training AI and robotics systems by recording everyday activities from a first-person perspective using a mobile app.

4 hours, 2 minutes ago

Grabación de videos en primera persona para IA

Toloka 251-1K Internet Software & Services

Proyecto remoto y flexible para grabar actividades cotidianas desde casa y contribuir al entrenamiento de IA y robótica para una empresa no especificada.

4 hours, 2 minutes ago

Lead QA Analyst - Software Implementation & UAT

PartnerOne 51-250 Media

Mortgage Cadence is seeking a Lead QA Analyst to own UAT planning, implementation support, and release readiness for its cloud-based mortgage lending platform.

Azure JIRA
4 hours, 2 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers