Synthetic Data Engineer (AI Data/Training)

3 weeks, 5 days ago
Senior
Data Science and Analytics
Hyphen Connect

Hyphen Connect

Hyphen Connect is a Web3 and AI talent agency that specializes in recruitment and staffing solutions for the blockchain and artificial intelligence sectors. The company focuses on connecting talent with businesses in these industries, utilizing its deep expertise in Web3 to meet specific project needs. The agency offers a range of services, including recruitment and headhunting tailored to Web3 and AI requirements, as well as recruitment process outsourcing (RPO) solutions. Hyphen Connect also provides talent sourcing from a curated network of professionals, HR consulting for early-stage startups, and career development programs to help individuals succeed in the AI and Web3 fields. Additionally, the company assists organizations in building strong employer brands and enhancing internal engagement.

staffing & recruiting
1-10
Founded 2024

Description

  • Design domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Implement automated quality scoring and de-duplication systems.
  • Manage data pipelines that feed directly into SFT and DPO training loops.

Requirements

  • Proven experience building large-scale data pipelines using tools such as Airflow, Spark, or Ray.
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation and bias mitigation.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

SAP BW Lead

Lingaro 5K-10K IT Services

SAP BW Lead for Poland’s CC Data Engineering & Management team at SAP, responsible for leading SAP BW-related data engineering work in a full-time remote role.

1 hour, 22 minutes ago

Senior Data Engineer

Egen.ai IT Services

Egen is seeking a Senior Data Engineer to build scalable, client-facing data platforms and API integrations on Google Cloud, with a focus on healthcare data solutions.

Apache Airflow AWS dbt GCP JSON Python REST API Salesforce SQL
2 hours, 2 minutes ago

Data Engineering Team Lead (Agentic Search)

Nebius 51-250 Internet Software & Services

Nebius is seeking a Data Engineering Team Lead to own the data platform supporting its agent-native search product, spanning ingestion, warehouse architecture, analytics, and trustworthy datasets for product and business decisions.

Apache Airflow Apache Spark AWS dbt GCP Kafka MapReduce Python Snowflake SQL
2 hours, 5 minutes ago

AI Data Engineer

Influur 11-50 Media

Influur is hiring an AI Data Engineer in London/remote to own the full data-to-agent lifecycle for its production influencer-marketing AI system.

AWS GCP LLM Python
2 hours, 27 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers