Research Engineer, Evaluations

3 weeks, 4 days ago
Full-time
Senior
Software Development
AssemblyAI

AssemblyAI

AssemblyAI is a leading provider of AI models for transcribing and understanding speech. Their Speech AI models offer accurate speech-to-text conversion, speaker detection, sentiment analysis, and more, enabling users to extract valuable insights from ...

Media
51-250
Founded 2017
$63M raised

Description

  • Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics.
  • Build and maintain competitive benchmarking pipelines against other providers in the market.
  • Design and run systematic experiments to measure the impact of model changes.
  • Onboard, curate, and maintain evaluation datasets, including public benchmarks and internal test sets.
  • Create evaluation subsets that stress-test specific capabilities and edge cases.
  • Define evaluation metrics that capture real-world performance.
  • Translate qualitative customer feedback into quantifiable evaluation criteria.
  • Work with customer-facing teams to understand pain points and turn them into research priorities.
  • Maintain clean evaluation pipelines and clear documentation to reduce friction for researchers.
  • Proactively identify evaluation gaps and propose solutions.

Requirements

  • Understanding of machine learning fundamentals, including how models are trained and evaluated.
  • Strong Python skills for writing evaluation scripts and working with data pipelines.
  • Comfort working with SQL and cloud infrastructure.
  • Strong intuition for evaluation metrics, including relative vs. absolute improvements and statistical rigor.
  • Familiarity with the voice agent stack, including VAD, ASR, turn detection, LLM, and TTS.
  • Ability to communicate technical results to researchers, leadership, and customer-facing teams.
  • Ownership mindset and ability to independently identify and fill evaluation gaps.
  • Tinkerer mentality and willingness to ship rough versions and iterate quickly.
  • Availability to work at least 3-4 hours overlapping with Eastern US time zone.
  • Experience with speech/audio ML or real-time systems (preferred).
  • Hands-on experience with voice agent orchestrators such as LiveKit, Pipecat, or Vapi (preferred).
  • Familiarity with standard ML evaluation practices and benchmarks (preferred).
  • Experience working with customer-facing or product teams (preferred).
  • Background in QA, data science, or applied ML roles (preferred).

Benefits

  • Salary range of $210,000 - $260,000.
  • Competitive compensation structure with opportunities for additional rewards and benefits.
  • Opportunity to work at a small, high-growth company with outsized ownership and impact.
  • A lean environment with fewer layers of bureaucracy and faster decision-making.
  • Exposure to meaningful scale in a proven business serving major customers.
  • Commitment to pay equity and consideration of relevant experience and qualifications in compensation decisions.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Principle Engineer -In Bayesian, Large Foundational Systems, and Distributional Reinforcement Learning

Airbnb 5K-10K Hotels, Restaurants & Leisure

Airbnb is hiring a Principal AI/ML Researcher and Engineer to advance probabilistic, adaptive AI systems that improve personalization, ranking, and decision-making across guest and host experiences at scale.

Apache Spark C++ Java Kafka LLM Machine Learning Python PyTorch Scala Statistics TensorFlow
22 hours, 23 minutes ago

Senior Simulation and Modeling Engineer

Relativity Space 251-1K Aerospace & Defense

Relativity Space is hiring a Guidance, Navigation, and Control and Performance engineer to develop simulation tools and models that support Terran R flight algorithm development, analysis, and testing.

C++ CI/CD Docker Python Rust
22 hours, 53 minutes ago

Senior Scraping Engineer (Web scraping & Anti-bot)

Infatica 1-10 Internet Software & Services

Infatica.io is seeking an experienced Tech Engineer to help build and lead the architecture of a high-load web scraping platform that delivers clean HTML or structured JSON outputs for cloud and on-premises deployments.

CI/CD Cloudflare Docker Go Grafana Helm HTTP Kubernetes Microservices Playwright Prometheus Puppeteer Python Redis Selenium TLS
1 day, 23 hours ago

Health Science Research Intern

OURA 251-1K Health Care Providers & Services

Oura is hiring a remote U.S. Health Science Research Intern to support clinical and real-world evidence research by contributing to study design, documentation, and data-driven insights for its Health Science team.

Python R SQL
2 days, 13 hours ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers