Omilia

Omilia

Omilia is a global leader in Conversational AI, offering AI-based self-service solutions for enhanced customer care fulfillment and success.

IT Services
251-1K
Founded 2002
$20M raised

Description

  • Own the end-to-end data architecture for the training environment, including dataset design, schema definition, and data flow from production to training systems.
  • Define and govern data selection and sampling strategies for production conversations, including diversity optimization, confidence-based filtering, edge-case prioritization, and deduplication.
  • Build and maintain the data catalog and dataset discovery infrastructure so ML teams can find, understand, and use training data efficiently.
  • Define annotation pipeline requirements for intent labeling, entity tagging, dialog act classification, task completion scoring, and agentic reasoning evaluation.
  • Design and maintain the closed-loop data flywheel that moves conversations from production through curation, annotation, model retraining, evaluation, and redeployment.
  • Own data pipelines and infrastructure across Snowflake, AWS S3, Airflow, and AWS SageMaker-integrated ML workflows.
  • Work directly with LLM, NLU, Speech, and Agentic teams to translate model data needs into dataset specifications and pipeline configurations.
  • Define data quality frameworks and targeted corpora extraction methods to improve model outcomes from low-confidence, no-match, and other failure-case data.
  • Evaluate and manage external data annotation vendors and ensure annotation workflows produce consistent, high-quality labels at scale.
  • Maintain documentation, dataset lineage, architecture RFCs, and best practices for the broader ML organization.

Requirements

  • 5+ years of experience in data architecture, data engineering, or LLM/ML data infrastructure with ownership of production data systems supporting model development.
  • Strong understanding of what makes training data high-quality, diverse, and useful for LLM and NLU model development.
  • Deep experience with data modeling, schema design, and data pipeline architecture.
  • Strong proficiency with Snowflake, AWS S3, and ETL/ELT orchestration tools such as Airflow, dbt, or similar.
  • Experience defining annotation requirements and managing data labeling workflows such as intent labeling, entity tagging, or dialog classification.
  • Experience with data cataloging, metadata management, and dataset discovery at scale.
  • Strong SQL and Python skills for data pipeline development and data quality analysis.
  • Experience with data quality frameworks, including deduplication, sampling strategies, and diversity optimization.
  • Master’s degree or PhD in Computer Science, Data Engineering, Information Systems, or a related field.
  • Preferred experience with LLM training data preparation, including instruction tuning, preference data, RLHF/DPO annotation, or synthetic data generation.
  • Preferred experience with data anonymization and PII/PCI redaction in ML data pipelines.
  • Preferred familiarity with AWS SageMaker integration, active learning, and data selection strategies.
  • Preferred knowledge of voice/audio data handling, storage, and processing at scale.
  • Experience with conversational AI data such as dialog transcripts, ASR outputs, and NLU annotations is a strong advantage.
  • Experience with data governance in regulated industries such as financial services or healthcare is a plus.
  • Familiarity with NER/NLU-based data processing approaches such as spaCy, HuggingFace, or custom entity recognition is desirable.

Benefits

  • Fixed compensation.
  • Long-term employment with vacation days.
  • Professional development support, including courses and training.
  • Opportunity to work on cutting-edge technology products with global impact.
  • Collaborative, fun-to-work-with colleagues.
  • Apple gear provided.
  • Equal opportunity employer commitment with a diverse and inclusive workplace.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Technical Data Governance Engineer (Azure/Microsoft 365) (Ready)

Multiplica Talent 251-1K Professional Services

Senior Technical Data Governance Engineer at a Microsoft-focused environment, responsible for implementing technical controls, data governance policies, and regulatory compliance across platforms to meet audit, security, and data residency requirements.

CI/CD Power BI PowerShell
1 hour, 19 minutes ago

Architect (Azure/Profisee) - Strategy and Governance

3Cloud 251-1K Internet Software & Services

3Cloud is hiring an Architect in Delivery to lead hands-on Azure cloud solution architecture for enterprise data governance and delivery projects, translating business needs into secure, scalable systems.

Agile Azure Machine Learning Scrum
1 hour, 58 minutes ago

Lead Consultant - Data Architecture & Engineering

3Cloud 251-1K Internet Software & Services

3Cloud is hiring a full-time Azure-focused technical leader to guide data warehouse and data pipeline work for cloud and hybrid client environments while remaining hands-on in delivery.

Agile Apache Spark AWS Azure Power BI Python SQL
4 hours, 13 minutes ago

Director, IT Architecture

Varicent 251-1K Professional Services

Varicent is seeking a Director of IT Architecture to define and lead enterprise architecture strategy across infrastructure, platforms, endpoints, data, and governance in a rapidly growing organization.

Linux LLM
5 hours, 13 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers