Outreach

Outreach

Outreach offers an AI-powered revenue workflow platform designed to enhance the productivity of sales, marketing, RevOps, and go-to-market teams by automating processes that help in efficiently managing accounts, forecasting, and closing deals.

Internet Software & Services
1K-5K
Founded 2014
$489M raised

Description

  • Own the end-to-end AI quality strategy for Outreach’s GenAI platform, agent workflows, LangGraph orchestration, and supporting ML pipelines.
  • Design and implement evaluation frameworks that combine deterministic checks, golden dataset testing, and LLM-as-judge grading.
  • Test the company’s AI agents end to end, including functional correctness, tool selection accuracy, context handling, and response quality.
  • Partner with Data Science, MLOps, Engineering, and Product teams to design testability into systems from the start.
  • Integrate evaluation pipelines into CI/CD workflows to catch regressions before production releases.
  • Define and track AI quality metrics such as answer quality, tool invocation accuracy, hallucination rates, latency, and regression trends.
  • Establish org-wide standards for prompt regression testing, retrieval quality evaluation, and agent behavior contracts.
  • Mentor engineers, review designs for testability, and promote quality-driven development practices across teams.
  • Stay current on AI evaluation tooling, LLM benchmarking, and testing research, and apply relevant advances to the team’s practices.

Requirements

  • 7–12 years of experience in software development and/or test automation, with experience leading quality efforts on complex, distributed systems.
  • B.S. in Computer Science or a related technical field.
  • Strong programming skills in Python, with experience building reusable and maintainable test frameworks.
  • Experience testing large-scale backend or platform systems, including microservices and API layers.
  • Deep understanding of test design principles, CI/CD integration, and scalable test automation.
  • Experience with test frameworks such as PyTest or equivalent.
  • Solid understanding of evaluation methods for non-deterministic systems, including statistical assertions, behavioral testing, and regression baselines.
  • Hands-on experience with Databricks for building and validating ML pipelines and data workflows.
  • Experience with MLflow for experiment tracking, model versioning, and pipeline observability.
  • Strong communication and collaboration skills across engineering, data science, and product functions.
  • Preferred: experience testing GenAI products, LLM-based systems, or agentic AI platforms.
  • Preferred: experience with prompt engineering and prompt tuning, including regression testing for prompt-driven behavior changes.
  • Preferred: hands-on experience with LLM-as-judge evaluation patterns.
  • Preferred: familiarity with LangGraph, LangChain, or similar agent orchestration frameworks.
  • Preferred: experience with ML pipelines or related tooling such as Kubeflow, Metaflow, or similar.
  • Preferred: understanding of RAG architectures and retrieval quality evaluation.
  • Preferred: experience with cloud platforms such as AWS, GCP, or Azure and containerized environments such as Docker or Kubernetes.
  • Preferred: domain knowledge in sales, sales engagement, or CRM platforms such as Salesforce or HubSpot.
  • Preferred: prior experience contributing to AI quality strategies in a product or research environment.

Benefits

  • Remote full-time role.
  • Opportunity to work on Outreach’s leading agentic AI platform for revenue teams.
  • High-impact ownership in a strategic, senior-level role.
  • Work on a product used by major enterprise organizations such as Databricks, SAP, Siemens, and Verizon.
  • Inclusive hiring approach that encourages applicants even if they do not meet every requirement.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Client AI Implementation Specialist

AI Acquisition 51-200 Business Consulting and Services

One of our seasoned B2B clients is hiring a remote AI Implementation Specialist to design, build, and deploy production-ready AI systems that automate workflows and improve business operations.

JavaScript LLM Python
1 hour, 25 minutes ago

DATA COORDINATOR

Inter 51-250 Banks

Inter is hiring a Data and AI leader to build and guide a high-impact team, strengthen its data platform, and advance machine learning solutions for the business.

Apache Airflow Apache Spark AWS Azure dbt GCP Kanban Machine Learning MLOps Python Scrum SQL
1 hour, 46 minutes ago

Wireless Integration & Validation Engineer (Starlink Mobile)

SpaceX 10K-50K Aerospace & Defense

SpaceX is hiring a Wireless Integration & Validation Engineer for the Starlink Mobile program to validate cellular RAN features and help enable seamless direct-to-cell connectivity across the global network.

Agile CI/CD Git Linux Python Wireshark
3 hours, 8 minutes ago

Manager, Development Testing (Drones)

Anduril Industries 1K-5K Aerospace & Defense

Anduril Industries is seeking a Manager, Development Testing (Drones) to lead development, verification, and qualification testing for drone programs and ensure safety, reliability, and airworthiness across the product lifecycle.

MATLAB Python
4 hours, 48 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers