Freelance Agent Evaluation Engineer

2 hours, 34 minutes ago
Part-time
Senior
Software Development
Mindrift.ai: Be the “I” in AI

Mindrift.ai: Be the “I” in AI

Join 10,000+ experts earning $15-50/hr training AI models remotely. Flexible freelance work, weekly payments. No AI experience required. Apply in 5 minutes.

Internet Software & Services

Description

  • Build realistic virtual developer environments with codebases, infrastructure, tickets, documentation, and conversation context.
  • Design tasks from intermediate states of the simulated environments and define what a solved solution looks like.
  • Write tests that validate agent solutions, accepting all correct approaches and rejecting incorrect ones.
  • Review agent outputs, analyze failures, and refine tasks and tests based on QA feedback.
  • Iterate on evaluation criteria until the dataset is fair, robust, and realistic.
  • Ensure tasks are challenging for frontier AI coding models while remaining solvable.
  • Work within project-based engagements rather than permanent employment.

Requirements

  • 5+ years of experience in software development.
  • Experience with a core stack including Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
  • Experience writing functional and integration tests.
  • English proficiency at B2 level or higher.
  • Submit your CV in English.
  • Ability to complete project tasks by the deadline and meet acceptance criteria.
  • Comfort working in a project-based, independent work model.
  • Experience understanding how software systems fail in real-world scenarios (preferred).

Benefits

  • Up to $40/hr equivalent compensation, depending on level and pace.
  • Estimated ~20 hours of work per task, with flexible scheduling.
  • Choose when and how to work.
  • Project-based paid opportunities with leading tech companies.
  • Opportunity to work on cutting-edge AI evaluation projects.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Freelance Annotator (English) - AI Trainer

Toloka 251-1K Internet Software & Services

Toloka is seeking freelance annotators for project-based AI training work, reviewing and labeling data to help improve generative AI outputs when tasks are available.

Generative AI
2 hours, 19 minutes ago

AI-Native Marketing Coordinator

Pavago IT Services

Pavago’s client is hiring a remote AI-Native Marketing Coordinator to drive content, outreach, podcast production, events, and AI-powered marketing operations across a fast-growing startup.

Cloudflare Copywriting DNS GitHub Supabase
2 hours, 34 minutes ago

Record Your Daily Routine & Get Paid - AI Training (Remote)

Toloka 251-1K Internet Software & Services

Project-based freelance work on an AI training platform where participants record first-person videos of everyday household activities to help train AI systems and robots.

2 hours, 34 minutes ago

Video Recorder para Proyecto de IA (Freelance)

Toloka 251-1K Internet Software & Services

Persona para grabar actividades cotidianas en primera persona para ayudar a entrenar sistemas de IA y robótica mediante una aplicación móvil.

2 hours, 34 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers