AI Evaluation Engineer - Mathematics & Algorithms

4 hours, 42 minutes ago

Description

  • Design and build multi-agent benchmark tasks that require multi-step mathematical reasoning and algorithmic problem-solving.
  • Create complex, decomposable problems in areas such as competition mathematics, numerical analysis, combinatorial optimization, and statistical inference.
  • Develop verification scripts to validate numerical outputs, proof correctness, logical steps, and algorithmic constraints.
  • Write clear, structured problem statements with precise notation and well-defined outputs.
  • Design task decomposition strategies for parallel or multi-agent execution.
  • Implement computational solutions and validation pipelines using Python.
  • Work in containerized environments such as Docker to support reproducibility and evaluation.

Requirements

  • 5+ years of experience in mathematics, quantitative research, or computational science.
  • Strong Python skills for scientific computing, including NumPy, SciPy, SymPy, or similar tools.
  • Experience solving or designing complex mathematical and algorithmic problems.
  • Ability to create precise, verifiable outputs and avoid subjective problem formats.
  • Experience with mathematical proofs or formal reasoning.
  • Familiarity with AI benchmarks or evaluation frameworks, such as SWE-bench.
  • Comfort working in Docker environments.
  • Solid understanding of numerical methods, including precision, convergence, and error bounds.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI/ML Data Contributor

TSMG Professional Services

AI/ML Data Contributor role with a company supporting active and upcoming machine learning projects across the United States, focused on task-based data and testing work in remote and occasional on-site settings.

Machine Learning
17 minutes ago

Synthetic Data Engineer (AI Data/Training)

Hyphen Connect 1-10 staffing & recruiting

Synthetic Data Engineer at an organization building domain-specific synthetic data generation pipelines and data workflows that support model training.

Apache Airflow Apache Spark
49 minutes ago

Freelance Annotator (English) - AI Trainer

Toloka 251-1K Internet Software & Services

Toloka is seeking freelance AI annotators to support project-based online tasks that help train and improve generative AI through data review, labeling, and evaluation.

Generative AI
53 minutes ago

AI/ML Data Contributor

TSMG Professional Services

AI/ML Data Contributor is a remote, task-based contract role with a U.S.-based company supporting machine learning projects through data collection, evaluation, and testing.

Machine Learning
1 hour, 7 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers