Freelance Agent Evaluation Engineer

3 hours, 18 minutes ago
Part-time
Senior
Software Development
Mindrift.ai: Be the “I” in AI

Mindrift.ai: Be the “I” in AI

Join 10,000+ experts earning $15-50/hr training AI models remotely. Flexible freelance work, weekly payments. No AI experience required. Apply in 5 minutes.

Internet Software & Services

Description

  • Build realistic virtual developer environments with codebases, infrastructure, and supporting context such as tickets, documentation, and conversations.
  • Design tasks from intermediate states of the environments, including crafting prompts and defining what counts as a solved solution.
  • Write tests that verify agent solutions by accepting valid approaches and rejecting incorrect ones.
  • Iterate on tasks and tests based on QA feedback, reviewing agent outputs and refining evaluations for fairness and robustness.
  • Analyze solution failures to identify where models struggle and use those insights to improve task difficulty and realism.
  • Ensure tasks are solvable by AI agents while still meaningfully challenging advanced coding models.
  • Submit completed tasks by the deadline and meet the stated acceptance criteria.
  • Collaborate through a project-based workflow that includes qualification, task completion, and review.

Requirements

  • 5+ years of experience in software development.
  • Experience with the core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
  • Experience writing functional and integration tests.
  • English proficiency at B2 level or higher.
  • Ability to work on project-based assignments rather than permanent employment.
  • Ability to estimate and complete tasks that take around 20 hours each, depending on complexity.
  • Experience understanding how coding models fail in realistic development scenarios.
  • Submit a CV in English.

Benefits

  • Up to $30/hr equivalent compensation, depending on level and pace.
  • Flexible schedule with no fixed working hours; you choose when and how to work.
  • Project-based work with tasks estimated at about 20 hours each.
  • Opportunity to work on advanced AI evaluation projects for leading tech companies.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Video Collection (LATAM) - Freelance AI Trainer Project

Meridial Marketplace, by Invisible 501-1000 information technology & services

A LATAM-based recording role with an unspecified company capturing short first-person videos of everyday manual tasks using a head-mounted smartphone.

2 hours, 33 minutes ago

Video Collection (LATAM) - Freelance AI Trainer Project

Meridial Marketplace, by Invisible 501-1000 information technology & services

LATAM-based video contributors are needed to record first-person footage of everyday manual tasks for a client project focused on real-world household and, where applicable, commercial activities.

2 hours, 33 minutes ago

Video Collection (LATAM) - Freelance AI Trainer Project

Meridial Marketplace, by Invisible 501-1000 information technology & services

LATAM-based video contributors are needed to record first-person footage of everyday manual tasks using a head-mounted smartphone for client-approved data collection.

2 hours, 48 minutes ago

Senior QA Automation Engineer (Python | Selenium | Pytest)

3Pillar Global 1K-5K Internet Software & Services

3Pillar is hiring a Senior QA Automation Engineer to define and execute quality validation for a mission-critical AI-powered system, with emphasis on proving correct LLM decisions, enforcing multi-tenant security, and blocking defects before production.

Agile C# CI/CD Go GraphQL Java JIRA JMeter Microservices OWASP Pytest Python REST API Ruby Scrum Selenium SQL
3 hours, 3 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers