AI Evaluation Engineer - Planning & Operations

2 weeks, 6 days ago
Contract
Senior
Artificial Intelligence and Machine Learning

Description

  • Design and build multi-agent benchmark tasks for planning, scheduling, and resource allocation scenarios.
  • Create operational decision-making problems in areas such as logistics, project planning, incident response, and capacity planning.
  • Develop constraint-rich problem statements with multiple interacting variables, dependencies, and timelines.
  • Build verification scripts to assess feasibility, completeness, and optimality of solutions.
  • Define task decomposition strategies across specialized sub-agents focused on allocation, constraint resolution, and optimization.
  • Model realistic operational systems with constraints and interdependencies.
  • Implement validation logic and evaluation pipelines using Python.
  • Work with Docker environments to support reproducibility and execution.
  • Collaborate with internal teams to improve task quality, coverage, and evaluation rigor.

Requirements

  • 5+ years of experience in operations, project management, logistics, or supply chain.
  • Strong ability to formalize constraints, dependencies, and scheduling logic.
  • Proficiency in Python for building validation and verification scripts.
  • Experience with optimization techniques such as linear programming, constraint satisfaction, or scheduling algorithms.
  • Strong structured problem-solving and decomposition skills.
  • Experience with AI benchmarks or evaluation frameworks such as SWE-bench or similar.
  • Hands-on experience with Docker, including Dockerfiles, image builds, and debugging.
  • Background in operations research or optimization-heavy domains is preferred.
  • Experience with simulation or modeling tools is preferred.
  • Familiarity with AI planning systems or automated reasoning is preferred.
  • Project management experience or certifications such as PMP or Agile are preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Research Associate (Remote)

Ballotpedia 51-250 Internet Software & Services

Ballotpedia is hiring a full-time, fully remote Research Associate to manage client-facing research projects, drive business development, and support the organization’s content and revenue goals.

Asana CRM Salesforce
8 minutes ago

Automotive Parts Associate - Entry Level

Carvana 10K-50K Automotive

Carvana is hiring an entry-level Parts Associate to support vehicle reconditioning operations by ordering, tracking, receiving, and organizing parts in a high-tech Inspection Center.

16 minutes ago

Chemistry & Python Expert - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking chemistry specialists for project-based AI work focused on creating and verifying computational chemistry tasks for leading tech companies.

C MATLAB NumPy Pandas Python R SciPy SQL
32 minutes ago

Senior Specialist, Consumer Affairs

Puck 1-10 Internet Software & Services

Fortitude Re is seeking a Senior Specialist, Consumer Affairs to support complaint review, investigation, and responses for institutional clients in its Life and Annuity Solutions business.

47 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers