Role Overview:
Data quality and diversity is one of the single most important factors to training state-of-the-art models. As a member of the technical staff focused on data at Reflection, you will play a pivotal role in shaping how we collect, process, and analyze human and internet data for training AI models. You will design and execute protocols, develop tools and interfaces, and manage data workflows that underpin our RL and SFT efforts. Your work will combine creativity, precision, and technical expertise to ensure our datasets meet the highest quality standards and align with our mission.
Key Responsibilities:
1. Experiment Design and Data Collection:
Design experiments, protocols, and user interfaces to collect high-quality human data for training AI models or evaluating AI systems.
Determine the type of human data needed to optimize model performance.
Analyze collected data qualitatively and quantitatively, including reviewing PRs, Issues, code reviews, and traces.
Ensure creativity and communication are integral to the data collection process.
2. Human Data Operations:
Manage projects, including payments and tracking hours for human raters.
Train and onboard raters, providing clear guidance and technical support.
Debug technical problems encountered by raters and ensure quality control through regular review of their responses.
Prioritize attention to detail, organization, and communication in all operations.
Apply coding skills to annotation tasks, such as code reviews and code generation.
3. Data Engineering:
Design, implement, and optimize data pipelines to support scalable data collection.
Leverage prompt engineering and integrate LLM solutions to improve processes.
Develop scalable and asynchronous systems for data operations.
4. Human-Facing Data Services:
Develop and manage human-facing services using tools like TypeScript, JavaScript, and Firebase.
Build backend services using frameworks like FastAPI or Go.
Utilize cloud platforms such as GCP and AWS to deploy and maintain services.
Qualifications:
Proven experience in experiment and protocol design for data collection.
Strong analytical skills with the ability to conduct both qualitative and quantitative data analysis.
Excellent organizational and communication skills to manage human data operations effectively.
Hands-on experience in data engineering, prompt engineering, and integrating LLM solutions.
Proficiency in coding, particularly in managing human-facing services and annotation tasks.
Familiarity with tools and platforms like TypeScript, Firebase, FastAPI, GCP, and AWS.
Background in software engineering, including open-source contributions and code review experience.
Experience with ML/LLM is a plus.
What We Offer:
The opportunity to work at the forefront of AI research and data collection for training cutting-edge models.
Collaboration with a team of world-class researchers and engineers from top AI labs and companies.
Competitive compensation and benefits, with opportunities for professional growth.
How to Apply: If our mission and approach inspire you, and you are curious and data obsessed in nature, we want to hear from you. Consider applying to join the data team at Reflection.