AI Evaluation Engineer - Planning & Operations

4 hours, 43 minutes ago
Contract
Senior
Artificial Intelligence and Machine Learning

Description

  • Design and build multi-agent benchmark tasks for planning, scheduling, and resource allocation scenarios.
  • Create operational decision-making problems in areas such as logistics, project planning, incident response, and capacity planning.
  • Develop constraint-rich problem statements with multiple interacting variables, dependencies, and timelines.
  • Build verification scripts to assess feasibility, completeness, and optimality of solutions.
  • Define task decomposition strategies across specialized sub-agents focused on allocation, constraint resolution, and optimization.
  • Model realistic operational systems with constraints and interdependencies.
  • Implement validation logic and evaluation pipelines using Python.
  • Work with Docker environments to support reproducibility and execution.
  • Collaborate with internal teams to improve task quality, coverage, and evaluation rigor.

Requirements

  • 5+ years of experience in operations, project management, logistics, or supply chain.
  • Strong ability to formalize constraints, dependencies, and scheduling logic.
  • Proficiency in Python for building validation and verification scripts.
  • Experience with optimization techniques such as linear programming, constraint satisfaction, or scheduling algorithms.
  • Strong structured problem-solving and decomposition skills.
  • Experience with AI benchmarks or evaluation frameworks such as SWE-bench or similar.
  • Hands-on experience with Docker, including Dockerfiles, image builds, and debugging.
  • Background in operations research or optimization-heavy domains is preferred.
  • Experience with simulation or modeling tools is preferred.
  • Familiarity with AI planning systems or automated reasoning is preferred.
  • Project management experience or certifications such as PMP or Agile are preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI/ML Data Contributor

TSMG Professional Services

AI/ML Data Contributor role with a company supporting active and upcoming machine learning projects across the United States, focused on task-based data and testing work in remote and occasional on-site settings.

Machine Learning
18 minutes ago

Synthetic Data Engineer (AI Data/Training)

Hyphen Connect 1-10 staffing & recruiting

Synthetic Data Engineer at an organization building domain-specific synthetic data generation pipelines and data workflows that support model training.

Apache Airflow Apache Spark
49 minutes ago

Freelance Annotator (English) - AI Trainer

Toloka 251-1K Internet Software & Services

Toloka is seeking freelance AI annotators to support project-based online tasks that help train and improve generative AI through data review, labeling, and evaluation.

Generative AI
53 minutes ago

AI/ML Data Contributor

TSMG Professional Services

AI/ML Data Contributor is a remote, task-based contract role with a U.S.-based company supporting machine learning projects through data collection, evaluation, and testing.

Machine Learning
1 hour, 8 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers