Freelance Agent Evaluation Engineer

10 hours, 23 minutes ago
Part-time
Senior
Software Development
Mindrift.ai: Be the “I” in AI

Mindrift.ai: Be the “I” in AI

Join 10,000+ experts earning $15-50/hr training AI models remotely. Flexible freelance work, weekly payments. No AI experience required. Apply in 5 minutes.

Internet Software & Services

Description

  • Build realistic developer environments, including a virtual company, codebase, infrastructure, and supporting context such as tickets, documentation, and conversations.
  • Design tasks from intermediate states of these environments, including crafting prompts and defining what counts as a solved task.
  • Ensure tasks are solvable by an AI agent while still being challenging for frontier models.
  • Write functional and integration tests that validate agent solutions and distinguish correct from incorrect approaches.
  • Review agent solutions, analyze failures, and refine tasks and tests based on QA feedback.
  • Iterate on evaluation criteria to make the assessment fair, robust, and resistant to overly strict or lenient grading.
  • Work within project-based assignments rather than permanent employment, completing tasks by the required deadline.
  • Submit completed tasks that meet the specified acceptance criteria.

Requirements

  • 5+ years of experience in software development.
  • Experience with the core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, and Redis.
  • Experience writing tests, including functional and integration tests.
  • English proficiency at B2 level or higher.
  • CV must be submitted in English.
  • Ability to complete approximately 20 hours of work per task, depending on complexity.
  • Ability to work on a project-based, deadline-driven basis.
  • Experience understanding failure modes in AI coding agents is relevant to the role.
  • This opportunity is not for data labeling, prompt engineering, or writing code from scratch.

Benefits

  • Up to $30/hr equivalent compensation, depending on level and pace.
  • Flexible schedule with the ability to choose when and how to work.
  • Project-based work rather than permanent employment.
  • Estimated 20 hours per task, allowing for concentrated short-term engagements.
  • Payment upon successful completion of tasks that meet acceptance criteria.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Intern, Forward Deployed Engineering

Workato 251-1K IT Services

Workato is hiring a Forward Deployed Engineering intern to support AI-driven automation initiatives by helping build intelligent agents and enterprise workflow integrations on its Agentic AI platform.

JavaScript JSON LLM Python REST API Salesforce
9 hours, 38 minutes ago

Software Engineer 3

Black Duck Inn 1K-5K Internet Software & Services

Black Duck Software is seeking a License Developer to evolve legacy licensing systems and build reliable, production-ready services for secure 24/7 customer use.

CI/CD DevSecOps Java Kubernetes Linux REST API Ruby on Rails
9 hours, 38 minutes ago

Statistical Programmer Sr

eClinical Solutions 251-1K Professional Services

Experienced Statistical Programmer role at a clinical research organization focused on delivering compliant statistical programming outputs for multiple clinical studies and regulatory submissions.

Git GitHub GitLab R SAP Shell Scripting
9 hours, 38 minutes ago

Data Conversion Software Engineer

Career TEAM 251-1K Professional Services

Career Team is hiring a Data Conversion Software Engineer to build data transformation and integration software for government-funded workforce development programs across the United States.

Agile Angular CI/CD Docker Express.js JavaScript JSON MongoDB NestJS Next.js Node.js React Scrum TypeScript XML
9 hours, 53 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers