Synthetic Data Engineer (AI Data/Training)

2 hours, 17 minutes ago
Mid Level
Data Science and Analytics
Hyphen Connect

Hyphen Connect

Hyphen Connect is a Web3 and AI talent agency that specializes in recruitment and staffing solutions for the blockchain and artificial intelligence sectors. The company focuses on connecting talent with businesses in these industries, utilizing its deep expertise in Web3 to meet specific project needs. The agency offers a range of services, including recruitment and headhunting tailored to Web3 and AI requirements, as well as recruitment process outsourcing (RPO) solutions. Hyphen Connect also provides talent sourcing from a curated network of professionals, HR consulting for early-stage startups, and career development programs to help individuals succeed in the AI and Web3 fields. Additionally, the company assists organizations in building strong employer brands and enhancing internal engagement.

staffing & recruiting
1-10
Founded 2024

Description

  • Design domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Implement automated quality scoring systems for generated data.
  • Build and maintain de-duplication systems for synthetic datasets.
  • Manage data pipelines that feed into SFT training loops.
  • Manage data pipelines that feed into DPO training loops.
  • Support data processing and model training workflows within the organization.

Requirements

  • Proven experience building large-scale data pipelines.
  • Experience with Airflow, Spark, or Ray.
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation.
  • Familiarity with bias mitigation.
  • Experience working on synthetic data generation pipelines.
  • Experience supporting training loops for model development.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

AI Data Platform Lead

Agiloft 51-250 Capital Markets

Agiloft is hiring a remote AI Data Platform Lead to design and govern the data foundation that supports its AI transformation across enterprise analytics, GPT assistants, agents, and predictive systems.

Apache Airflow AWS dbt Kafka Machine Learning Python Snowflake SQL
8 minutes ago

Senior Data Engineer

Midnite 11-50 Hotels, Restaurants & Leisure

Midnite is hiring a Senior Data Engineer to help scale the data platform behind its sports betting and gaming products as the company grows.

Apache Airflow AWS Dagster dbt Python Snowflake SQL
8 minutes ago

Data Platform Engineer

Apptronik 51-250 Aerospace & Defense

Apptronik is seeking a Data Platform Engineer to build the backend data infrastructure that powers robotic telemetry, sensor, and training data for Apollo across development, fleet analytics, and production operations.

Ansible Apache Airflow Apache Spark Docker Encryption Go gRPC Helm InfluxDB Kafka Kubernetes Machine Learning PostgreSQL Python REST API Terraform TimescaleDB
1 hour, 51 minutes ago

Data Engineering and Management Team Leader

Lingaro 5K-10K IT Services

Lingaro is hiring a Data Engineering and Management Team Leader in Poland to lead a remote team, support customer delivery, and help grow the Data E&M competency and business.

Apache Spark CI/CD Databricks Docker Generative AI Kubernetes Python SQL
2 hours, 17 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers