Full-Time

RL Environments

Bespoke Labs

Bespoke Labs

1-10 employees

No salary listed

Mountain View, CA, USA

Hybrid

Category
AI & Machine Learning (1)
Required Skills
Python
Pytorch
Machine Learning
AWS
Data Analysis
Reinforcement Learning
Google Cloud Platform
Requirements
  • Strong foundation in machine learning—either through a PhD/MS in ML, Computer Science, or equivalent industry experience
  • Deep curiosity about agent behavior and failure modes, with ability to form hypotheses and test them systematically
  • Experience analyzing complex systems and extracting actionable insights from data
  • Patience and attention to detail for studying agent rollouts and identifying subtle patterns
  • Proficiency in Python and ML frameworks (PyTorch, JAX, or similar)
  • Experience with reinforcement learning concepts and agent training, even if not from a reinforcement learning background
  • Ability to design experiments, run training loops, and interpret results
  • Comfortable working with cloud platforms (Google Cloud Platform, Amazon Web Services) for running experiments at scale
  • Can build pipelines and automation to scale research insights into production
  • Experience with data analysis tools and creating reproducible workflows
  • Systematic approach to quality verification and testing
Responsibilities
  • Develop systematic strategies and recipes for creating high-quality RL environments that effectively train and evaluate agents
  • Study how LLMs and agents fail across different task types, identifying patterns that inform better environment design
  • Create benchmark environments that test specific agent capabilities, packaging them for external release on our evaluation platform
  • Verify environment quality through hands-on testing—training small-scale agents, checking for reward hacking, and analyzing training dynamics
  • Work with our environment creation pipeline to scale production of validated environments
  • Analyze agent rollout data to uncover insights about what makes environments challenging, diverse, and pedagogically valuable
  • Collaborate with the team to ensure benchmarks integrate smoothly into our external-facing dashboards
  • Establish quality standards and evaluation protocols that maintain high bars as we scale environment production
Desired Qualifications
  • Hands-on experience with reinforcement learning or agent training systems
  • Background in data curation, dataset creation, or evaluation benchmark design
  • Experience with AI safety, robustness testing, or adversarial evaluation
  • Publications or projects related to RL, agent evaluation, or data-centric AI
  • Understanding of how to design environments that surface specific failure modes
  • Experience shipping research artifacts (datasets, benchmarks, evaluation suites) to the community

Company Size

1-10

Company Stage

N/A

Total Funding

N/A

Headquarters

Mountain View, California

Founded

2024

Simplify Jobs

Simplify's Take

What believers are saying

  • OpenThoughts3-7B achieves SOTA 53% on AIME 2025, driving adoption.
  • Reasoning datasets competition with Hugging Face boosts community influence.
  • Curator library enables scalable synthetic data generation for enterprises.

What critics are saying

  • Scale AI expansions capture Bespoke's post-training dataset niche within 6-12 months.
  • Hugging Face collections surpass OpenThoughts, ending SOTA in 12-18 months.
  • DeepMind poaches CEO Mahesh Sathiamoorthy's network, causing departures in 12-24 months.

What makes Bespoke Labs unique

  • OpenThoughts-114k dataset powers over 230 models, leading open reasoning data.
  • OpenThinker-32B matches DeepSeek-R1-Distill-32B on AIME benchmarks.
  • OpenThoughts-Agent curates superior agent training datasets collaboratively.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Bespoke Labs who can refer or advise you

Benefits

Health Insurance

Hybrid Work Options

Remote Work Options

Flexible Work Hours

Wellness Program

Mental Health Support

Conference Attendance Budget

Professional Development Budget

Stock Options

Company Equity

401(k) Retirement Plan

401(k) Company Match

Paid Vacation

Paid Holidays

Paid Sick Leave

Parental Leave

Fertility Treatment Support

Family Planning Benefits

Adoption Assistance

Home Office Stipend

Phone/Internet Stipend