Researcher, Evaluations

10 hours, 3 minutes ago
Full-time
Mid Level
Data Science and Analytics
Epoch AI

Epoch AI

Epoch is a research institute focused on exploring critical trends and questions that influence the development and governance of artificial intelligence, providing valuable insights into its societal implications.

Professional Services
1-10
Founded 2022

Description

  • Create and curate an evaluation suite of realistic real-world tasks and update it as AI capabilities evolve.
  • Design grading rubrics for evaluating AI performance on open-ended tasks.
  • Regularly evaluate new frontier AI models and products using the task suite.
  • Analyze evaluation results and compare model performance across tasks.
  • Create public-facing reports, blog posts, and data visualizations based on research findings.
  • Ensure evaluation work informs other research topics and keeps the team updated.
  • Automate parts of the workflow where useful and help build standalone benchmarks from the evaluation process.

Requirements

  • Experience conducting research and data analysis with enough comfort in light coding to analyze results.
  • Strong analytical thinking and the ability to run rigorous, evidence-based experiments.
  • A grounded, skeptical perspective on AI capabilities and limitations.
  • Comfort working with AI agents and tools, including delegating tasks to AI systems.
  • Familiarity with AI benchmarks and evaluations, with opinions on what they can and cannot show.
  • Strong written communication skills for clear, precise explanations of nuanced observations.
  • Experience testing frontier models and writing assessments of their capabilities, preferred.
  • Coding skills, including Python proficiency, preferred.
  • Professional-level English proficiency and the ability to submit application materials in English.
  • Preference for candidates who can overlap with UTC-8 and UTC time zones and who can travel for three annual retreats.

Benefits

  • Annual salary of $115,000–$200,000 USD, depending on location and experience.
  • Fully remote work with flexible hours.
  • Comprehensive global benefits, including health insurance, supplemental local benefits where available, life insurance, and a pension plan if applicable.
  • Generous PTO, including 30 protected days per year, unlimited personal and sick leave, and 4 months of paid parental leave for eligible permanent staff.
  • Flexible expense budget for equipment, productivity tools, learning and development, and unlimited AI tools spending subject to approval.
  • Paid work trips, including three staff retreats per year and relevant conferences.
  • Access to well-equipped Berkeley offices with paid meals, snacks, gym access, and at least 20 in-office days per year for all staff.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Principal AI Security Specialist

Zscaler 1K-5K Internet Software & Services

Zscaler is hiring a Principal AI Security Specialist to lead enterprise AI security conversations and help customers securely adopt GenAI across complex technical and sales engagements.

Cybersecurity Generative AI LLM
9 hours, 3 minutes ago

Principal AI Security Specialist - West

Zscaler 1K-5K Internet Software & Services

Zscaler is hiring a Principal AI Security Specialist to lead enterprise-facing AI security conversations and help customers securely adopt GenAI across complex environments.

Cybersecurity Generative AI LLM
9 hours, 3 minutes ago

Principal AI Security Specialist - Federal

Zscaler 1K-5K Internet Software & Services

Zscaler is hiring a Principal AI Security Specialist to lead field-facing enterprise AI security engagements, helping Fortune 500 customers adopt GenAI securely across complex sales cycles.

Cybersecurity Generative AI LLM
9 hours, 3 minutes ago

AI-Powered Marketing Coordinator

Pavago IT Services

Remote full-time AI-Powered Marketing Coordinator at a fast-moving company, supporting external-facing marketing, content, outreach, events, and creative initiatives with an emphasis on AI-driven execution.

Cloudflare Copywriting DNS Email Marketing GitHub Supabase
9 hours, 18 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers