Senior AI Agent Evaluation Engineer

3 hours, 13 minutes ago
Part-time
Senior
Artificial Intelligence and Machine Learning
Mindrift.ai: Be the “I” in AI

Mindrift.ai: Be the “I” in AI

Join 10,000+ experts earning $15-50/hr training AI models remotely. Flexible freelance work, weekly payments. No AI experience required. Apply in 5 minutes.

Internet Software & Services

Description

  • Build realistic virtual developer environments with codebases, infrastructure, and supporting context such as tickets, documentation, and conversations.
  • Design tasks from intermediate states of the simulated environments and define what counts as a solved solution.
  • Create prompts that are challenging but solvable by an AI agent.
  • Write tests that validate agent solutions, accepting all valid approaches and rejecting incorrect ones.
  • Review agent solutions, analyze failures, and iterate on tasks and tests based on QA feedback.
  • Refine evaluation criteria until the task design is fair and robust.
  • Identify scenarios that reveal the strengths and weaknesses of frontier coding models.
  • Submit completed tasks by the deadline and meet the required acceptance criteria.

Requirements

  • 5+ years of software development experience.
  • Experience with Python and FastAPI.
  • Experience with JavaScript or TypeScript and React.
  • Experience with Docker, Postgres, Kafka, and Redis.
  • Experience writing functional and integration tests.
  • English proficiency at B2+ level.
  • Ability to work on a project-based, non-permanent basis.
  • Willingness to complete qualification steps before joining a project.

Benefits

  • Up to $50/hr equivalent compensation, depending on level and pace.
  • Flexible schedule with the ability to choose when and how to work.
  • Project-based work with estimated tasks of around 20 hours each.
  • Payment for completed tasks upon acceptance.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Digital Twin- Senior 3D QC Manager

Media.Monks 5K-10K Media

Monks is hiring a 3D QC Manager to ensure digital twins and their renderings accurately match client reference imagery across the full visualization pipeline.

2 hours, 28 minutes ago

Quality Control Supervisor

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is hiring a Quality Supervisor to lead quality inspection operations for defense technology products, ensuring safe, reliable delivery through disciplined inspection processes and team leadership.

ERP JIRA
2 hours, 28 minutes ago

FBS - Agile Product Manager III

Capgemini 100K+ Internet Software & Services

A large U.S. insurer is hiring a strategic professional to support Agile Release Train execution and quality assurance across product, business, and technology teams.

Agile
3 hours, 13 minutes ago

Quality Assurance Engineer (4 month contract)

Zipdev 51-250 Professional Services

An experienced QA Engineer is needed to support a client team by ensuring the quality, reliability, and performance of software across web, mobile, and desktop applications in a fast-paced Agile environment.

Agile Azure CI/CD JMeter Postman Scrum Selenium
3 hours, 13 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers