RAG and Evaluation Engineer

2 weeks, 1 day ago
Full-time
Mid Level
Data Science and Analytics
LTS

LTS

Internet Software & Services
251-1K
Founded 2005

Description

  • Own the knowledge surface by building ingestion pipelines for source code, structured metadata, technical documentation, patches, and other customer-provided corpora.
  • Own retrieval quality across chunking, embeddings, hybrid retrieval, reranking, and freshness.
  • Own the evaluation harness for translation accuracy, dependency-map correctness, and overall agent quality.
  • Run A/B tests and regression detection across prompts, retrieval, and model changes.
  • Close the feedback loop by using production usage signals to improve evals and retrieval.
  • Define success metrics and determine whether the agent is actually improving when the team does not yet have a clear baseline.
  • Pair with Agent Engineers on the prompt-and-eval iteration cycle.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, Information Science, or a related field, plus 4 years of professional software engineering experience; equivalent experience may substitute for the degree requirement.
  • Production experience shipping a RAG system with measurable quality.
  • Experience with retrieval pipelines, including ingestion, chunking, embedding, hybrid retrieval, and reranking.
  • Strong applied evaluation skills, including benchmark design, regression detection, and LLM-as-judge patterns.
  • Ability to work in a fast-paced, collaborative environment.
  • Heavy native use of AI tooling, including agents in parallel and model-as-collaborator workflows.
  • Strong TypeScript or Python skills.
  • Demonstrated experience in a remote work environment.
  • Ability to measure shipped systems with benchmarks and data-backed opinions on chunking and retrieval.
  • Comfort defining metrics before the team has fully aligned on them.
  • Nice to have: code-as-corpus retrieval experience.
  • Nice to have: applied IR or search-engine background.
  • Nice to have: synthetic data generation and LLM-as-judge experience.
  • Nice to have: open-source contributions to retrieval, evaluation, or RAG tooling.
  • Nice to have: experience integrating retrieval feedback loops with production usage.
  • Nice to have: healthcare IT or legacy modernization domain experience.
  • Nice to have: public technical writing or conference talks on retrieval or evaluation.

Benefits

  • Opportunity to support high-visibility federal missions in IT and healthcare.
  • A culture that values innovation, growth, collaboration, and quality.
  • Access to cutting-edge tools and technologies.
  • Comprehensive benefits for employees and their families.
  • A career path that rewards ambition and performance.
  • Salary transparency with compensation ranges shared upfront.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Intern, Forward Deployed Engineering

Workato 251-1K IT Services

Workato is hiring a Forward Deployed Engineering intern to support AI-driven automation initiatives by helping build intelligent agents and enterprise workflow integrations on its Agentic AI platform.

JavaScript JSON LLM Python REST API Salesforce
12 hours, 41 minutes ago

Mortgage Underwriter - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking mortgage underwriting and loan origination professionals for project-based AI evaluation work focused on testing and improving mortgage-related AI outputs and compliance decisions.

12 hours, 56 minutes ago

Downeast Cider - AI Full Stack Developer

Jobrack 11-50 Professional Services

Downeast Cider is hiring an AI Full Stack Developer to become its first technical employee and build production-ready internal tools that improve operations across the business.

CRM GCP JavaScript NetSuite Python Shopify Snowflake SQL TypeScript
12 hours, 56 minutes ago

Claims Processing Agent - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking part-time project-based insurance and claims specialists to evaluate and improve AI systems for auto insurance decision-making, fraud detection, and subrogation testing.

12 hours, 56 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers