Full-Time

Synthetic Data Engineer

AI Data/Training

Hyphen Connect

Hyphen Connect

No salary listed

Seattle, WA, USA

In Person

Category
Data & Analytics (1)
Required Skills
LLM
Airflow
Apache Spark
Requirements
  • Proven experience building large-scale data pipelines (Airflow, Spark, Ray).
  • Deep knowledge of prompt engineering for data generation.
  • Familiarity with dataset distillation and bias mitigation.
Responsibilities
  • Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting.
  • Implement automated quality scoring and de-duplication systems.
  • Manage data pipelines that feed directly into Supervised Fine-Tuning training loops and Data Protection Officer training loops.

Company Size

N/A

Company Stage

N/A

Total Funding

N/A

Headquarters

N/A

Founded

N/A