AI Query Evaluation Specialist (Copilot Competitive Intelligence)

1 hour, 39 minutes ago
Full-time
Entry Level
Data Science and Analytics
Blueprint Technologies

Blueprint Technologies

Blueprint Technologies specializes in delivering tailored business management and IT solutions that optimize cloud spending, enhance productivity, and drive innovation across various industries, including manufacturing, retail, finance, and healthcare.

Internet Software & Services
251-1K
Founded 2013

Description

  • Review and analyze real user query logs to identify queries with clear intent and strong representativeness of English-speaking markets.
  • Curate and maintain high-quality datasets used to evaluate AI systems such as Microsoft Copilot, ChatGPT, and Gemini.
  • Annotate queries across multiple evaluation dimensions, including web search needs, PII presence, domain expertise requirements, and other response-quality attributes.
  • Ensure annotations are consistent, structured, and aligned with evolving evaluation guidelines and safety protocols.
  • Use tools such as Excel to organize, review, and summarize evaluation outputs.
  • Identify trends in query patterns and provide feedback to improve dataset coverage and quality.
  • Maintain a high level of attention to detail, documentation quality, and evaluation integrity.

Requirements

  • Strong English reading comprehension with the ability to interpret subtle differences in user intent.
  • Demonstrated analytical thinking and logical reasoning skills.
  • Experience working with structured data or annotation workflows.
  • Familiarity with tools such as Microsoft Excel or similar data analysis tools.
  • Strong user empathy and understanding of how diverse users formulate queries.
  • Curiosity and familiarity with modern AI tools such as Copilot, ChatGPT, and Gemini.
  • High attention to detail with a track record of delivering consistent, high-quality work.
  • Reliable, proactive, and adaptable in fast-changing environments.
  • Prior experience in data annotation, content evaluation, or dataset curation for AI or search products (preferred).
  • Experience with AI evaluation, search relevance, or linguistic analysis (preferred).
  • Basic statistical or data analysis knowledge (preferred).
  • Demonstrated ability to quickly learn and interpret unfamiliar domains (preferred).
  • Fluency in English plus at least one additional language, such as Japanese, Korean, French, Chinese, German, or Italian (preferred).

Benefits

  • Medical, dental, and vision coverage.
  • Flexible Spending Account.
  • 401(k) program.
  • Competitive PTO offerings.
  • Parental leave.
  • Opportunities for professional growth and development.
  • Remote U.S.-based work arrangement.
  • Annual salary range of $80,000 to $95,000 USD, with a midpoint of $87,500.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

First-Person Video AI Trainer (Worldwide)

Toloka 251-1K Internet Software & Services

Toloka Annotators is hiring freelance contributors to record first-person video of everyday household tasks to help train generative AI systems.

Generative AI
34 minutes ago

AI Evaluation Engineer (Software Engineering / Code)

Gramian Consultancy Group Professional Services

Gramian Consultancy is hiring an AI Evaluation Engineer focused on software engineering to create benchmark tasks that test AI systems on realistic codebase analysis, bug fixes, refactors, and migrations.

CI/CD Django Docker FastAPI Flask Git JavaScript Node.js Pytest Python Unittest
57 minutes ago

First-Person Video AI Trainer (Worldwide)

Toloka 251-1K Internet Software & Services

Toloka Annotators is seeking freelance contributors to record point-of-view videos of everyday household tasks to help train generative AI systems.

1 hour, 5 minutes ago

Statistics & Python Expert - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking statistics specialists for project-based AI work creating and verifying computational math problems for leading tech companies.

C MATLAB NumPy Pandas Python R SciPy SQL
1 hour, 10 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers