AI Evaluation Engineer - Mathematics & Algorithms

2 weeks, 5 days ago

Description

  • Design and build multi-agent benchmark tasks that require multi-step mathematical reasoning and algorithmic problem-solving.
  • Create complex, decomposable problems in areas such as competition mathematics, numerical analysis, combinatorial optimization, and statistical inference.
  • Develop verification scripts to validate numerical outputs, proof correctness, logical steps, and algorithmic constraints.
  • Write clear, structured problem statements with precise notation and well-defined outputs.
  • Design task decomposition strategies for parallel or multi-agent execution.
  • Implement computational solutions and validation pipelines using Python.
  • Work in containerized environments such as Docker to support reproducibility and evaluation.

Requirements

  • 5+ years of experience in mathematics, quantitative research, or computational science.
  • Strong Python skills for scientific computing, including NumPy, SciPy, SymPy, or similar tools.
  • Experience solving or designing complex mathematical and algorithmic problems.
  • Ability to create precise, verifiable outputs and avoid subjective problem formats.
  • Experience with mathematical proofs or formal reasoning.
  • Familiarity with AI benchmarks or evaluation frameworks, such as SWE-bench.
  • Comfort working in Docker environments.
  • Solid understanding of numerical methods, including precision, convergence, and error bounds.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Chemistry & Python Expert - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking chemistry specialists for project-based AI evaluation work focused on creating and validating computational chemistry tasks for leading tech companies.

C MATLAB NumPy Pandas Python R SciPy SQL
1 hour, 38 minutes ago

Civil Engineer & Python Expert - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking part-time engineering contributors for project-based AI work that involves creating and verifying computational problems for leading tech companies.

C MATLAB NumPy Pandas Python R SciPy SQL
2 hours, 13 minutes ago

Civil Engineer & Python Expert - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking part-time engineering contributors for project-based AI work focused on creating, testing, and validating computational problems for real engineering workflows.

C MATLAB NumPy Pandas Python R SciPy SQL System Design
2 hours, 19 minutes ago

AI-Powered Marketing Coordinator

Pavago IT Services

AI-Powered Marketing Coordinator at a remote company supporting cross-functional marketing, content, outreach, events, and creative execution through AI-driven workflows and day-to-day operational ownership.

Cloudflare DNS Email Marketing GitHub Social Media Marketing Supabase
4 hours, 24 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers