Synthetic Data Engineer (AI Data/Training)

3 weeks, 1 day ago
Mid Level
Data Science and Analytics
Hyphen Connect

Hyphen Connect

Hyphen Connect is a Web3 and AI talent agency that specializes in recruitment and staffing solutions for the blockchain and artificial intelligence sectors. The company focuses on connecting talent with businesses in these industries, utilizing its deep expertise in Web3 to meet specific project needs. The agency offers a range of services, including recruitment and headhunting tailored to Web3 and AI requirements, as well as recruitment process outsourcing (RPO) solutions. Hyphen Connect also provides talent sourcing from a curated network of professionals, HR consulting for early-stage startups, and career development programs to help individuals succeed in the AI and Web3 fields. Additionally, the company assists organizations in building strong employer brands and enhancing internal engagement.

staffing & recruiting
1-10
Founded 2024

Description

  • Design domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Implement automated quality scoring systems for generated data.
  • Build de-duplication systems to improve dataset quality.
  • Manage data pipelines that feed into SFT training loops.
  • Manage data pipelines that feed into DPO training loops.

Requirements

  • Proven experience building large-scale data pipelines.
  • Experience with Airflow, Spark, or Ray.
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation.
  • Familiarity with bias mitigation.
  • Experience designing synthetic data generation pipelines is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Azure Data Engineer - Remote, Latin America

Bluelight Consulting 11-50 Internet Software & Services

Bluelight is hiring a remote Azure Data Engineer in Latin America to build and maintain data pipelines and warehousing solutions for client projects across a growing software consultancy.

Agile Apache Spark Azure Git Power BI Python REST API SQL SQL Server Tableau
43 minutes ago

Azure Data Engineer - Remote, Latin America

Bluelight Consulting 11-50 Internet Software & Services

Bluelight is hiring a remote Azure Data Engineer in Latin America to develop and optimize data pipelines and warehouse solutions for client projects in a fast-growing software consultancy.

Agile Apache Spark Azure Git Machine Learning Power BI Python REST API SQL SQL Server Tableau
1 hour, 4 minutes ago

Software Team Leader

Salt Security 251-1K IT Services

Salt Security is seeking a team leader to head its Data Pipeline engineering team, focusing on large-scale data and stream processing systems that help detect API attacks and improve customers’ security visibility.

AWS Go Java Kafka Microservices Scala
1 hour, 10 minutes ago

Multimedia Generative AI Analyst - USA (Remote)

Welo Global Professional Services

Welo Data is hiring a remote, full-time Generative AI Analyst in the United States to evaluate AI-generated videos by comparing prompts to visual outputs and documenting errors for quality review.

Generative AI
1 hour, 11 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers