Freelance Agent Evaluation Engineer

9 hours, 29 minutes ago
Contract
Senior
Software Development
Mindrift.ai: Be the “I” in AI

Mindrift.ai: Be the “I” in AI

Join 10,000+ experts earning $15-50/hr training AI models remotely. Flexible freelance work, weekly payments. No AI experience required. Apply in 5 minutes.

Internet Software & Services

Description

  • Build realistic developer environments that include a virtual company, codebase, infrastructure, and supporting context such as tickets, documentation, and conversations.
  • Design tasks from intermediate states of these environments, including the prompt, success criteria, and solvability for an AI agent.
  • Write tests that verify agent solutions and distinguish valid approaches from incorrect ones.
  • Review agent solutions, analyze failures, and refine tasks and tests based on QA feedback.
  • Iterate on evaluation criteria until the assessment is fair, robust, and challenging for frontier AI models.
  • Collaborate in a project-based workflow to complete assigned tasks and submit them by the deadline.
  • Guide and evaluate AI-generated code rather than writing most of the code from scratch.

Requirements

  • 5+ years of experience in software development.
  • Experience with the core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
  • Experience writing tests, including functional and integration tests.
  • English proficiency at B2 level or higher.
  • Submit a CV in English and indicate your English proficiency.
  • Ability to work on a project-based, non-permanent opportunity.
  • Ability to complete tasks that are estimated to take about 20 hours, depending on complexity.
  • Understanding of how to evaluate model failures and identify scenarios that distinguish strong from weak solutions.

Benefits

  • Up to $40/hr equivalent compensation, depending on level and pace.
  • Flexible schedule with self-directed work hours.
  • Project-based work where you choose when and how to work.
  • Tasks estimated at approximately 20 hours each.
  • Opportunity to work on AI evaluation projects for leading tech companies.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Record Your Daily Routine & Get Paid - AI Training (Remote)

Toloka 251-1K Internet Software & Services

Project-based freelance opportunity with an AI training platform recording first-person videos of everyday household activities to help train AI systems and robots.

9 hours, 29 minutes ago

Staff SDET Mobile (AI-Enabled)

Goods & Services 51-250 Media

Goods & Services is seeking a Staff SDET to design and maintain automated testing for native mobile and responsive web applications in a global, design-led engineering environment.

Android Appium C# CI/CD Espresso Git iOS Java JavaScript Kotlin Playwright Postman Python Selenium TypeScript
9 hours, 29 minutes ago

AI Trainer - Freelance Annotator (English)

Toloka 251-1K Internet Software & Services

Mindrift is seeking a project-based AI Trainer - Annotator to evaluate and improve AI responses for e-commerce and online shopping tasks on a non-permanent, part-time basis.

E-commerce Machine Learning
9 hours, 29 minutes ago

Biology & Biophysics Researchers (India, Part-time)

Weekday 11-50 Construction & Engineering

An AI lab client is hiring part-time life science researchers to help train and evaluate frontier AI systems on advanced biological and biophysical reasoning.

Machine Learning
9 hours, 29 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers