Synthetic Data Engineer (AI Data/Training)

1 hour, 38 minutes ago
Mid Level
Data Science and Analytics
Hyphen Connect

Hyphen Connect

Hyphen Connect is a Web3 and AI talent agency that specializes in recruitment and staffing solutions for the blockchain and artificial intelligence sectors. The company focuses on connecting talent with businesses in these industries, utilizing its deep expertise in Web3 to meet specific project needs. The agency offers a range of services, including recruitment and headhunting tailored to Web3 and AI requirements, as well as recruitment process outsourcing (RPO) solutions. Hyphen Connect also provides talent sourcing from a curated network of professionals, HR consulting for early-stage startups, and career development programs to help individuals succeed in the AI and Web3 fields. Additionally, the company assists organizations in building strong employer brands and enhancing internal engagement.

staffing & recruiting
1-10
Founded 2024

Description

  • Design domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Implement automated quality scoring systems for generated data.
  • Build de-duplication systems to improve dataset quality.
  • Manage data pipelines that feed into SFT training loops.
  • Manage data pipelines that feed into DPO training loops.

Requirements

  • Proven experience building large-scale data pipelines.
  • Experience with Airflow, Spark, or Ray.
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation.
  • Familiarity with bias mitigation.
  • Experience designing synthetic data generation pipelines is preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Data Engineer

Rithum Internet Software & Services

Rithum is seeking a Senior Data Engineer to build and maintain cloud data infrastructure that supports advanced analytics, AI-driven operations, and data products for internal and external users.

Agile Apache Airflow Apache Spark AWS Databricks dbt Docker Flink Kafka Kubernetes Machine Learning SQL SQL Server
27 minutes ago

Data Platform Engineer

Apptronik 51-250 Aerospace & Defense

Apptronik is seeking a Data Platform Engineer to build the backend data infrastructure that powers robotic telemetry, sensor, and training data for Apollo across development, fleet analytics, and production operations.

Ansible Apache Airflow Apache Spark Docker Encryption Go gRPC Helm InfluxDB Kafka Kubernetes Machine Learning PostgreSQL Python REST API Terraform TimescaleDB
59 minutes ago

Senior Data Engineer

Fundraise Up 51-250 Capital Markets

Fundraise Up is hiring a Senior Data Engineer to own and evolve the data platform behind its global nonprofit fundraising product, with a focus on scalable pipelines, analytics infrastructure, and data quality.

Apache Airflow AWS ClickHouse Docker Elasticsearch Git Kafka Koa MLflow MongoDB NestJS Node.js Python React Redis TypeScript Vue.js
1 hour, 2 minutes ago

AI Solution Strategist

Nice Côte d'Azur Hotels, Restaurants & Leisure

NiCE is hiring an individual contributor to sit at the intersection of AI product strategy, pre-sales, and customer success, helping define, launch, and improve conversational AI agents for high-volume customer interactions.

1 hour, 29 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers