Head of Data Engineering
Updated on 2/8/2024
AI systems for practical real-world applications
Imbue is a pioneering company in the AI industry, focusing on the development of reasoning AI systems that aim to enhance human-computer interaction and productivity. Their competitive edge lies in their unique approach to training foundation models optimized for reasoning, which they then utilize to prototype agents that can accomplish larger goals safely in the real world. With a culture that values empowerment, freedom, and dignity, Imbue is leading the industry in redefining the concept of the personal computer, making it a compelling place to work.
AI & Machine Learning
Data & Analytics
San Francisco, California
Growth & Insights
6 month growth↑ 10%
1 year growth↑ 121%
2 year growth↑ 210%
San Francisco, CA, USA
Mergers & Acquisitions (M&A)
Data & Analytics
Finance & Banking
- Passionate about data
- Excellent software engineer
- Great communicator
- Familiar with Python
- Lead data engineering efforts
- Coordinate human data collection processes
- Collect, filter, and preprocess raw web data, longer texts, code, and other generated data
- Direct and ensure constant improvement of data quality
- Measure and understand the quality of datasets
- Ensure quick and easy acquisition of human labels for datasets
- Experience in coordinating efforts between multiple external organizations
- Experience in creating software with human-like intelligence
We believe that high quality data is the most important part of creating high performance machine learning systems, regardless of whether they are simple classifiers or state of the art reasoning agents. We view this work as one of the most important at the company, and want someone who is solely dedicated to coordinating our efforts across the diverse range of data that matters to us.
In this role, you will lead our data engineering efforts. You will coordinate everything from human data collection processes to the collection, filtering, and preprocessing of raw web data, longer texts, code, and other generated data. You will be responsible for the ultimate quality and quantity of data on which we can train our systems, which is the primary factor in their performance. You will both direct this work, and get into the details yourself to ensure that we are constantly improving our data quality.
• Scan one million physical books and convert them into high quality pretraining data.
• Find 90% of the most useful text available online and make clean training data from it.
• Generate pretraining data in ways that are guaranteed to have low error.
• Measure and understand the quality of each of our datasets.
• Ensure that researchers and engineers can quickly and easily acquire human labels for a dataset.
• Passionate about data. You should be happy to look at and deeply engage with the raw data.
• An excellent software engineer. We care about engineering best practices.
• A great communicator. You will need to coordinate efforts between multiple external organizations and within our own team.
• Familiar with python.
Compensation and Benefits
• Work on the most important part of our system
• Work at a place that deeply cares about data quality
• Work directly on creating software with human-like intelligence.
• Generous compensation, equity, and benefits.
• $20K+ yearly budget for self-improvement: coaching, courses, conferences, etc.
• Actively co-create and participate in a positive, intentional team culture.
• Spend time learning, reading papers, and deeply understanding prior work.
• Frequent team events, dinners, off-sites, and hanging out.
• Compensation packages are highly variable based on a variety of factors. If your salary requirements fall outside of the stated range, we still encourage you to apply. The range for this role is $170,000–$400,000 cash, $500,000–$4,000,000 in equity
How to apply
All submissions are reviewed by a person, so we encourage you to include notes on why you’re interested in working with us. If you have any other work that you can showcase (open source code, side projects, etc.), certainly include it! We know that talent comes from many backgrounds, and we aim to build a team with diverse skillsets that spike strongly in different areas.
We try to reply either way within a week or two at most (usually much sooner).
Imbue builds AI systems that reason and code, enabling AI agents to accomplish larger goals and safely work in the real world. We train our own foundation models optimized for reasoning and prototype agents on top of these models. By using these agents extensively, we gain insights into improving both the capabilities of the underlying models and the interaction design for agents.
We aim to rekindle the dream of the *personal* computer, where computers become truly intelligent tools that empower us, giving us freedom, dignity, and agency to pursue the things we love.