Synthetic Data Engineer (AI Data/Training)

6 days, 18 hours ago
Senior
Data Science and Analytics
Hyphen Connect

Hyphen Connect

Hyphen Connect is a Web3 and AI talent agency that specializes in recruitment and staffing solutions for the blockchain and artificial intelligence sectors. The company focuses on connecting talent with businesses in these industries, utilizing its deep expertise in Web3 to meet specific project needs. The agency offers a range of services, including recruitment and headhunting tailored to Web3 and AI requirements, as well as recruitment process outsourcing (RPO) solutions. Hyphen Connect also provides talent sourcing from a curated network of professionals, HR consulting for early-stage startups, and career development programs to help individuals succeed in the AI and Web3 fields. Additionally, the company assists organizations in building strong employer brands and enhancing internal engagement.

staffing & recruiting
1-10
Founded 2024

Description

  • Design domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Implement automated quality scoring and de-duplication systems.
  • Manage data pipelines that feed directly into SFT and DPO training loops.

Requirements

  • Proven experience building large-scale data pipelines using tools such as Airflow, Spark, or Ray.
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation and bias mitigation.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Platform Engineer

Lola Blankets 1-10 Textiles, Apparel & Luxury Goods

Lola Blankets is hiring a Data Platform Engineer to own its analytics platform and support engineering work across product, operations, integrations, and platform reliability.

Apache Airflow Dagster dbt LLM Prefect Python Snowflake SQL TypeScript
3 hours, 50 minutes ago

Oracle Data Engineer (with German Language)

Soname Solutions 11-50 Internet Software & Services

Soname Solutions is seeking a Senior Data Warehouse Developer to support a German telecom client by designing, optimizing, and evolving its multi-layer data warehouse environment.

Oracle PostgreSQL Power BI SQL
5 hours, 53 minutes ago

Senior Data Engineer

Alpaca 51-250 Capital Markets

Alpaca is seeking a Senior Data Platform Engineer to build and operate the data management layer for its global brokerage infrastructure as it scales across customers, jurisdictions, and high-volume financial event streams.

Apache Airflow dbt Docker GCP Helm Kafka Kubernetes Python SQL Terraform Trino
8 hours, 12 minutes ago

Data Engineering Lead

Seer Interactive 51-250 Professional Services

Seer Interactive is hiring a remote Data Engineering Lead to own the technical direction of its data platform, shaping architecture, data modeling, and AI-enabled workflows for marketing analytics across the company and its clients.

Agile Apache Airflow dbt GCP Git Looker Pytest Python SQL
9 hours, 40 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers