Senior AI Engineer, Agentic Evaluation & V&V

3 weeks ago
Full-time
Senior
Software Development
Slingshot Aerospace

Slingshot Aerospace

Slingshot.space is a cutting-edge technology company that specializes in providing innovative solutions for space exploration, satellite communication, and aerospace engineering. We offer a range of products and services including satellite launch serv...

Diversified Telecommunication Services
51-250
Founded 2017

Description

  • Extend and maintain Slingshot’s V&V SDK and evaluation framework for simulation-backed validation of agentic AI systems.
  • Design and implement agent-level and end-to-end evaluations, including benchmark scenarios, scoring logic, and experiment harnesses.
  • Build benchmark scenarios and tooling that measure planning, reasoning, and operational performance for autonomous mission planning systems.
  • Translate astrodynamics and mission-domain concepts into executable evaluation scenarios and simulation configurations.
  • Develop reusable SDK interfaces, adapters, and evaluation utilities that connect V&V systems, TALOS benchmarks, and agent workflows.
  • Define and apply metrics for capability evaluation, failure analysis, regression detection, and comparative benchmarking.
  • Partner with cross-functional teams to identify evaluation needs and improve coverage of critical capabilities.
  • Contribute to best practices for evaluating complex, autonomous AI systems.
  • Uphold strong engineering standards through testing, documentation, reproducibility, and maintainable system design.

Requirements

  • 6+ years of experience in software engineering, machine learning engineering, applied AI, or equivalent experience.
  • Strong Python engineering skills with experience building SDKs, libraries, or evaluation tooling.
  • Experience designing evaluation frameworks, benchmarks, metrics, or test harnesses for AI/ML systems.
  • Ability to analyze system behavior, identify failure modes, and evaluate performance in complex autonomous or semi-autonomous systems.
  • Familiarity with modern agent frameworks, orchestration patterns, or protocol-based integrations.
  • Experience working in cross-functional, multidisciplinary teams.
  • Strong written and verbal communication skills.
  • Bachelor’s degree in a relevant science or engineering field, or equivalent experience.
  • Must be a U.S. citizen and eligible to obtain and maintain a government security clearance.
  • Preferred: Experience in autonomous systems such as self-driving or ADAS, including perception, planning, simulation, or safety validation.
  • Preferred: Experience developing or evaluating agentic AI systems, including multi-step, tool-using, or autonomous workflows.
  • Preferred: Experience with reinforcement learning systems and simulation-based evaluation.
  • Preferred: Familiarity with benchmark design, experiment tracking, and trace-based evaluation workflows.
  • Preferred: Experience with orchestration frameworks such as LangGraph or similar tools.
  • Preferred: Knowledge of astrodynamics, orbital mechanics, or spacecraft mission planning.
  • Preferred: Experience translating mission or operational concepts into measurable evaluation scenarios.
  • Preferred: Familiarity with physics-based simulation, trajectory analysis, or space-domain modeling.
  • Preferred: Experience with observability and experiment tooling such as MLflow, Opik, or similar platforms.
  • Preferred: Experience transitioning advanced research systems into production environments.

Benefits

  • Remote, US-based work location.
  • Salary range of $150,000 to $250,000.
  • Full-time exempt classification.
  • Opportunity to work on mission-critical space safety and security applications.
  • Equal Opportunity Employer commitment to equity, diversity, and inclusion.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

[Job - 29422] Desenvolvedor FullStack Sênior (.Net/Angular/IA), Brazil

CI&T 5K-10K Internet Software & Services

CI&T is hiring a Senior FullStack Developer (.Net/Angular/AI) in Brazil to build and support AI-enabled web applications in a collaborative, multicultural project environment.

Angular C# Datadog Docker Grafana Kafka .NET PostgreSQL RabbitMQ REST API Solid.js SQL Server
7 hours, 8 minutes ago

Senior GenAI Integrated Designer

Brandtech+ 501-1000 Marketing services

Brandtech+ is hiring a Senior GenAI Integrated Designer to create and adapt digital, social, e-commerce, and motion content using GenAI workflows for high-profile brands.

After Effects Digital Marketing E-commerce Figma Generative AI Illustrator Instagram API Photoshop Social Media Marketing TikTok
8 hours, 26 minutes ago

Forward Deployed Engineer

Nice Côte d'Azur Hotels, Restaurants & Leisure

NiCE is hiring an individual contributor to architect and deliver production-grade conversational AI agents for enterprise customers, driving real-world customer experience outcomes and shaping the company’s AI agent platform.

Go LLM React TypeScript
8 hours, 41 minutes ago

[Job-29179] Senior Python AWS, Brazil

CI&T 5K-10K Internet Software & Services

CI&T is hiring a Senior Python AWS engineer in Brazil to design and deliver mission-critical serverless and generative AI solutions for complex business processes.

API Gateway AWS Datadog DynamoDB Generative AI Microservices Python REST API
9 hours, 26 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers