Synthetic Data Engineer (AI Data/Training)

1 month, 1 week ago
Mid Level
Data Science and Analytics
Hyphen Connect

Hyphen Connect

Hyphen Connect is a Web3 and AI talent agency that specializes in recruitment and staffing solutions for the blockchain and artificial intelligence sectors. The company focuses on connecting talent with businesses in these industries, utilizing its deep expertise in Web3 to meet specific project needs. The agency offers a range of services, including recruitment and headhunting tailored to Web3 and AI requirements, as well as recruitment process outsourcing (RPO) solutions. Hyphen Connect also provides talent sourcing from a curated network of professionals, HR consulting for early-stage startups, and career development programs to help individuals succeed in the AI and Web3 fields. Additionally, the company assists organizations in building strong employer brands and enhancing internal engagement.

staffing & recruiting
1-10
Founded 2024

Description

  • Design domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Implement automated quality scoring systems for generated data.
  • Build de-duplication systems to improve dataset quality.
  • Manage data pipelines that feed into SFT training loops.
  • Manage data pipelines that feed into DPO training loops.

Requirements

  • Proven experience building large-scale data pipelines.
  • Experience with Airflow, Spark, or Ray.
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation.
  • Familiarity with bias mitigation.
  • Experience designing synthetic data generation pipelines is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Data Engineer I

Samsara 1K-5K IT Services

Samsara is hiring a Senior Data Engineer to design and maintain central data lake pipelines that turn IoT and product data into trusted analytics datasets for analysis, model training, and dashboards.

Apache Airflow Apache Spark AWS Azure Dagster Databricks GCP Git GitHub Machine Learning Prefect Python REST API SQL
2 hours, 25 minutes ago

Strategy Consultant - AI Training & Evaluation (MBB & Top-Tier Firms)

Mindrift.ai: Be the “I” in AI Internet Software & Services

Toloka AI’s Mindrift team is hiring experienced strategy consultants to build consulting-themed learning environments and evaluation frameworks that help train and assess AI systems on high-level business reasoning.

Reinforcement Learning
2 hours, 25 minutes ago

Strategy Consultant - AI Training & Evaluation (MBB & Top-Tier Firms)

Mindrift.ai: Be the “I” in AI Internet Software & Services

Toloka AI is hiring experienced strategy consultants to build consulting-focused learning environments and evaluation frameworks that help train and assess next-generation AI systems.

LLM Machine Learning Reinforcement Learning
2 hours, 25 minutes ago

Strategy Consultant - AI Training & Evaluation (MBB & Top-Tier Firms)

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift, powered by Toloka, is hiring remote strategy consultants to turn real management consulting engagements into structured learning environments and evaluation tasks for AI systems.

Reinforcement Learning
2 hours, 25 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers