Research Engineer, Evaluations

1 hour, 49 minutes ago
Full-time
Senior
Software Development
AssemblyAI

AssemblyAI

AssemblyAI is a leading provider of AI models for transcribing and understanding speech. Their Speech AI models offer accurate speech-to-text conversion, speaker detection, sentiment analysis, and more, enabling users to extract valuable insights from ...

Media
51-250
Founded 2017
$63M raised

Description

  • Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics.
  • Build and maintain competitive benchmarking pipelines against other providers in the market.
  • Design and run systematic experiments to measure the impact of model changes.
  • Onboard, curate, and maintain evaluation datasets, including public benchmarks and internal test sets.
  • Create evaluation subsets that stress-test specific capabilities and edge cases.
  • Define evaluation metrics that capture real-world performance.
  • Translate qualitative customer feedback into quantifiable evaluation criteria.
  • Work with customer-facing teams to understand pain points and turn them into research priorities.
  • Maintain clean evaluation pipelines and clear documentation to reduce friction for researchers.
  • Proactively identify evaluation gaps and propose solutions.

Requirements

  • Understanding of machine learning fundamentals, including how models are trained and evaluated.
  • Strong Python skills for writing evaluation scripts and working with data pipelines.
  • Comfort working with SQL and cloud infrastructure.
  • Strong intuition for evaluation metrics, including relative vs. absolute improvements and statistical rigor.
  • Familiarity with the voice agent stack, including VAD, ASR, turn detection, LLM, and TTS.
  • Ability to communicate technical results to researchers, leadership, and customer-facing teams.
  • Ownership mindset and ability to independently identify and fill evaluation gaps.
  • Tinkerer mentality and willingness to ship rough versions and iterate quickly.
  • Availability to work at least 3-4 hours overlapping with Eastern US time zone.
  • Experience with speech/audio ML or real-time systems (preferred).
  • Hands-on experience with voice agent orchestrators such as LiveKit, Pipecat, or Vapi (preferred).
  • Familiarity with standard ML evaluation practices and benchmarks (preferred).
  • Experience working with customer-facing or product teams (preferred).
  • Background in QA, data science, or applied ML roles (preferred).

Benefits

  • Salary range of $210,000 - $260,000.
  • Competitive compensation structure with opportunities for additional rewards and benefits.
  • Opportunity to work at a small, high-growth company with outsized ownership and impact.
  • A lean environment with fewer layers of bureaucracy and faster decision-making.
  • Exposure to meaningful scale in a proven business serving major customers.
  • Commitment to pay equity and consideration of relevant experience and qualifications in compensation decisions.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Research Engineer

STR 251-1K Aerospace & Defense

STR’s APEX Group is seeking a Senior Research Engineer to develop advanced radar sensing concepts and demonstrations for defense research programs.

Machine Learning MATLAB Python
19 minutes ago

Cleared Vulnerability Research Engineer

Bugcrowd 1K-5K Internet Software & Services

Bugcrowd is hiring a cleared exploit development specialist to independently build novel vulnerability discovery and exploitation capabilities for real-world targets in a remote role with travel to customer sites.

Assembly C Python
19 minutes ago

Lead Algorithm & Signal Processing Engineer

STR 251-1K Aerospace & Defense

STR’s Electronic Warfare and Novel Capabilities Group is hiring a Lead Algorithm & Signal Processing Researcher to develop advanced algorithms and prototype systems for radar and electronic warfare applications.

Machine Learning MATLAB MLflow NLP Python PyTorch Reinforcement Learning TensorFlow
49 minutes ago

Principal Signal Processing Researcher

STR 251-1K Aerospace & Defense

STR is seeking a Principal Signal Processing Researcher to lead the development and demonstration of advanced signal processing, optimization, and machine learning algorithms for electronic warfare and sensor systems supporting national security missions.

C++ Machine Learning MATLAB Python SAP
1 hour, 4 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers