Synthetic Data Engineer (AI Data/Training)

7 hours, 43 minutes ago
Senior
Data Science and Analytics
Hyphen Connect

Hyphen Connect

Hyphen Connect is a Web3 and AI talent agency that specializes in recruitment and staffing solutions for the blockchain and artificial intelligence sectors. The company focuses on connecting talent with businesses in these industries, utilizing its deep expertise in Web3 to meet specific project needs. The agency offers a range of services, including recruitment and headhunting tailored to Web3 and AI requirements, as well as recruitment process outsourcing (RPO) solutions. Hyphen Connect also provides talent sourcing from a curated network of professionals, HR consulting for early-stage startups, and career development programs to help individuals succeed in the AI and Web3 fields. Additionally, the company assists organizations in building strong employer brands and enhancing internal engagement.

staffing & recruiting
1-10
Founded 2024

Description

  • Design domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Implement automated quality scoring systems for generated data.
  • Build de-duplication systems to improve dataset quality.
  • Manage data pipelines that feed directly into SFT training loops.
  • Manage data pipelines that feed directly into DPO training loops.

Requirements

  • Proven experience building large-scale data pipelines.
  • Experience with Airflow, Spark, or Ray.
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation.
  • Familiarity with bias mitigation techniques.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Lead Software Engineer - Healthcare Data & Audience Applications

Zeta Global 1K-5K Media

Zeta Global is hiring a Lead Software Engineer to design and scale healthcare-focused applications and data systems that support audience intelligence, activation, measurement, and reporting across HCP and DTC workflows.

Apache Spark AWS Azure CI/CD Docker Encryption Flink GCP gRPC HIPAA Java Kafka Kubernetes Machine Learning Microservices Python React REST API Ruby
18 minutes ago

Staff Data Engineer (Data Platform)

Quince 51-250 Textiles, Apparel & Luxury Goods

Quince is hiring a Staff Data Engineer to design and build its next-generation data platform, shaping core architecture and enabling data at scale across analytics, product, and engineering teams.

Apache Spark AWS dbt Kafka Kubernetes Python SQL Trino
1 hour, 24 minutes ago

Data & AI Platform Architect (Professional Services)

Databricks 1K-5K IT Services

Databricks is hiring a Professional Services Data & AI Platform Architect to lead short- to medium-term customer engagements that help clients design, implement, and adopt big data and AI solutions on the Databricks platform.

Apache Spark AWS Azure CI/CD Databricks GCP MLOps Python Scala
1 hour, 26 minutes ago

Junior Account Executive – AI Agents (Remote)

Process Street 51-250 Internet Software & Services

Process Street is hiring a community-focused sales professional to bring its Cora AI agent platform to a major metro market by building local relationships and closing deals with businesses in person and through events.

Salesforce
1 hour, 28 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers