Synthetic Data Engineer (AI Data/Training)

2 hours, 31 minutes ago
Mid Level
Data Science and Analytics
Hyphen Connect

Hyphen Connect

Hyphen Connect is a Web3 and AI talent agency that specializes in recruitment and staffing solutions for the blockchain and artificial intelligence sectors. The company focuses on connecting talent with businesses in these industries, utilizing its deep expertise in Web3 to meet specific project needs. The agency offers a range of services, including recruitment and headhunting tailored to Web3 and AI requirements, as well as recruitment process outsourcing (RPO) solutions. Hyphen Connect also provides talent sourcing from a curated network of professionals, HR consulting for early-stage startups, and career development programs to help individuals succeed in the AI and Web3 fields. Additionally, the company assists organizations in building strong employer brands and enhancing internal engagement.

staffing & recruiting
1-10
Founded 2024

Description

  • Design domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Implement automated quality scoring systems for generated data.
  • Build and maintain de-duplication systems for synthetic datasets.
  • Manage data pipelines that feed directly into SFT training loops.
  • Manage data pipelines that feed directly into DPO training loops.
  • Support high-quality data management for training workflows.
  • Contribute to data processing and model training success within the organization.

Requirements

  • Proven experience building large-scale data pipelines.
  • Experience with Airflow, Spark, or Ray.
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation.
  • Familiarity with bias mitigation techniques.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Data Engineer (GCP, BigQuery, Looker), PK [AS233]

Smart Working Internet Software & Services

Smart Working is hiring a Senior Data Engineer to own and evolve a remote data platform built on Google Cloud and BigQuery, supporting analytics and business intelligence across commercial, product, and marketing teams.

Apache Airflow dbt GCP Looker Python SQL Terraform
55 minutes ago

Shape the Future of AI — Danish Talent Hub

Welo Global Professional Services

Welo Data, part of Welocalize, is building a global contributor network for remote freelance AI data projects that help train and improve more accurate, inclusive, and human-centered AI systems.

LLM
1 hour, 3 minutes ago

Data Scientist (Python & SQL) - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking part-time Data Science specialists for project-based AI work focused on creating and validating realistic computational problems used to test and improve AI systems.

Feature Engineering Generative AI LLM Machine Learning Matplotlib MLOps NumPy Pandas Python PyTorch Scikit-learn SciPy Seaborn SQL TensorFlow
1 hour, 17 minutes ago

Data Scientist (Python & SQL) - Freelance AI Trainer

Mindrift.ai: Be the “I” in AI Internet Software & Services

Mindrift is seeking part-time Data Science specialists for project-based AI work focused on designing and validating computational problems that test and improve AI systems for leading tech companies.

Feature Engineering Generative AI LLM Machine Learning Matplotlib MLOps NumPy Pandas Python PyTorch Scikit-learn SciPy Seaborn SQL TensorFlow
2 hours, 14 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers