Omilia

Omilia is a global leader in Conversational AI, offering AI-based self-service solutions for enhanced customer care fulfillment and success.

IT Services

Information Technology

251-1K (360)

Founded 2002

$20M raised

32 open positions

Links

View All Jobs

Senior Data Architect

3 months ago

Spain, Czech Republic, Poland, Portugal, Greece

Full-time

Senior

Data Architect

Data Science and Analytics

Apache Airflow AWS dbt LLM Python SageMaker Snowflake spaCy SQL

Apply Now

Omilia

Omilia is a global leader in Conversational AI, offering AI-based self-service solutions for enhanced customer care fulfillment and success.

IT Services

251-1K

Founded 2002

$20M raised

View All Jobs 32

Description

Own the end-to-end data architecture for the training environment, including dataset design, schema definition, and data flow from production to training systems.
Define and govern data selection and sampling strategies for production conversations, including diversity optimization, confidence-based filtering, edge-case prioritization, and deduplication.
Build and maintain the data catalog and dataset discovery infrastructure so ML teams can find, understand, and use training data efficiently.
Define annotation pipeline requirements for intent labeling, entity tagging, dialog act classification, task completion scoring, and agentic reasoning evaluation.
Design and maintain the closed-loop data flywheel that moves conversations from production through curation, annotation, model retraining, evaluation, and redeployment.
Own data pipelines and infrastructure across Snowflake, AWS S3, Airflow, and AWS SageMaker-integrated ML workflows.
Work directly with LLM, NLU, Speech, and Agentic teams to translate model data needs into dataset specifications and pipeline configurations.
Define data quality frameworks and targeted corpora extraction methods to improve model outcomes from low-confidence, no-match, and other failure-case data.
Evaluate and manage external data annotation vendors and ensure annotation workflows produce consistent, high-quality labels at scale.
Maintain documentation, dataset lineage, architecture RFCs, and best practices for the broader ML organization.

Requirements

5+ years of experience in data architecture, data engineering, or LLM/ML data infrastructure with ownership of production data systems supporting model development.
Strong understanding of what makes training data high-quality, diverse, and useful for LLM and NLU model development.
Deep experience with data modeling, schema design, and data pipeline architecture.
Strong proficiency with Snowflake, AWS S3, and ETL/ELT orchestration tools such as Airflow, dbt, or similar.
Experience defining annotation requirements and managing data labeling workflows such as intent labeling, entity tagging, or dialog classification.
Experience with data cataloging, metadata management, and dataset discovery at scale.
Strong SQL and Python skills for data pipeline development and data quality analysis.
Experience with data quality frameworks, including deduplication, sampling strategies, and diversity optimization.
Master’s degree or PhD in Computer Science, Data Engineering, Information Systems, or a related field.
Preferred experience with LLM training data preparation, including instruction tuning, preference data, RLHF/DPO annotation, or synthetic data generation.
Preferred experience with data anonymization and PII/PCI redaction in ML data pipelines.
Preferred familiarity with AWS SageMaker integration, active learning, and data selection strategies.
Preferred knowledge of voice/audio data handling, storage, and processing at scale.
Experience with conversational AI data such as dialog transcripts, ASR outputs, and NLU annotations is a strong advantage.
Experience with data governance in regulated industries such as financial services or healthcare is a plus.
Familiarity with NER/NLU-based data processing approaches such as spaCy, HuggingFace, or custom entity recognition is desirable.

Benefits

Fixed compensation.
Long-term employment with vacation days.
Professional development support, including courses and training.
Opportunity to work on cutting-edge technology products with global impact.
Collaborative, fun-to-work-with colleagues.
Apple gear provided.
Equal opportunity employer commitment with a diverse and inclusive workplace.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Databricks Architect - R01568474

Brillio 1K-5K IT Services

Brillio is hiring a Databricks Architect to lead the design and implementation of enterprise Data & AI platforms for analytics, modernization, and AI adoption.

United States Full-time Lead Data Architect

$180k-$190k

Apache Spark Databricks GCP Generative AI LLM Machine Learning Vertex AI

1 day, 18 hours ago

Apply

1 day, 18 hours ago

Senior Google Cloud Data Architect

MPS 1-10 Professional Services

Google Cloud Data Architect at a company delivering international cloud transformation projects, responsible for designing modern GCP data platforms and supporting clients from solution design through pre-sales and delivery handover.

Hungary Full-time Senior Data Architect Sales Engineer

GCP Generative AI Looker

1 day, 18 hours ago

Apply

1 day, 18 hours ago

Senior Data Solutions Architect (Remote - Brazil)

M Criminal Defense / The M Criminal Law Criminal Defense Law

Michael & Associates is hiring a Senior Data Solutions Architect to build end-to-end data, automation, and AI systems that improve operations and support the firm’s growth.

Brazil Full-time Senior AI Engineer Data Architect

dbt LLM Machine Learning Metabase Python SQL Tableau

3 days, 18 hours ago

Apply

3 days, 18 hours ago

Data Modeler

Gritter Francona 1-10 Internet Software & Services

Gritter Francona is hiring a Data Modeler to support the Veterans Health Administration’s Veterans Family Member Program modernization by designing and maintaining enterprise data models and database structures.

United States Full-time Mid Level Data Architect

4 days, 18 hours ago

Apply

4 days, 18 hours ago

Omilia

Tags

Links

Senior Data Architect

Omilia

Description

Requirements

Benefits

Similar Roles

Databricks Architect - R01568474

Senior Google Cloud Data Architect

Senior Data Solutions Architect (Remote - Brazil)

Data Modeler

You're on a roll! Sign up now to keep applying.