Contract
Posted on 6/26/2025
SaaS data labeling and management platform
$25 - $38/hr
Remote in USA
Remote
Candidates must be based in the United States.
| , , , |
Labelbox provides a data-centric AI platform to create and manage labeled training data for machine learning. It is a SaaS with tiered subscriptions based on data volume, users, and features, plus professional services. The platform acts as a data factory with three parts: an enterprise data-management platform, Alignerr labeling service, and an expert marketplace, supporting images, video, and text with workflow automation, quality checks, and real-time collaboration, plus model-assisted labeling and API-first ML pipeline integration. Its goal is to help enterprises produce high-quality training data quickly to speed AI development and improve model performance.
Company Size
201-500
Company Stage
Series D
Total Funding
$188.9M
Headquarters
San Francisco, California
Founded
2018
Help us improve and share your feedback! Did you find this helpful?
Competitive remuneration
Flexible vacation policy (we don't count PTO Days)
401k Program
College savings account
HSA
Daily lunches paid for by the company (especially convenient while working from home)
Virtual wellness and guided meditation programs
Dog-friendly office
Regular company social events (happy hours, off-sites)
Professional development benefits and resources
Remote friendly (we hire in-office and remote employees)
Meta pauses work with Mercor after data breach puts AI industry secrets at risk. Major AI labs are investigating a security incident that impacted Mercor, a leading data vendor. The incident could have exposed key data about how they train AI models. Meta has paused all its work with the data contracting firm Mercor while it investigates a major security breach that impacted the startup, two sources confirmed to WIRED. The pause is indefinite, the sources said. Other major AI labs are also reevaluating their work with Mercor as they assess the scope of the incident, according to people familiar with the matter. Mercor is one of a few firms that OpenAI, Anthropic, and other AI labs rely on to generate training data for their models. The company hires massive networks of human contractors to generate bespoke, proprietary datasets for these labs, which are typically kept highly secret as they're a core ingredient in the recipe to generate valuable AI models that power products like ChatGPT and Claude Code. AI labs are sensitive about this data because it can reveal to competitors - including other AI labs in the US and China - key details about the ways they train AI models. It's unclear at this time whether the data exposed in Mercor's breach would meaningfully help a competitor. While OpenAI has not stopped its current projects with Mercor, it is investigating the startup's security incident to see how its proprietary training data may have been exposed, a spokesperson for the company confirmed to WIRED. The spokesperson says that the incident in no way affects OpenAI user data, however. Anthropic did not immediately respond to WIRED's request for comment. Mercor confirmed the attack in an email to staff on March 31. "There was a recent security incident that affected our systems along with thousands of other organizations worldwide," the company wrote. A Mercor employee echoed these points in a message to contractors on Thursday, WIRED has learned. Contractors who were staffed on Meta projects cannot log hours until - and if - the project resumes, meaning they could functionally be out of work, a source familiar claims. The company is working to find additional projects for those impacted, according to internal conversations viewed by WIRED. Mercor contractors were not told exactly why their Meta projects were being paused. In a Slack channel related to the Chordus initiative - a Meta-specific project to teach AI models to use multiple internet sources to verify their responses to user queries - a project lead told staff that Mercor was "currently reassessing the project scope." An attacker known as TeamPCP appears to have recently compromised two versions of the AI API tool LiteLLM. The breach exposed companies and services that incorporate LiteLLM and installed the tainted updates. There could be thousands of victims, including other major AI companies, but the breach at Mercor illustrates the sensitivity of the compromised data. Mercor and its competitors - such as Surge, Handshake, Turing, Labelbox, and Scale AI - have developed a reputation for being incredibly secretive about the services they offer to major AI labs. It's rare to see the CEOs of these firms speaking publicly about the specific work they offer, and they internally use codenames to describe their projects. Adding to the confusion around the hack, a group going by the well-known name Lapsus$ claimed this week that it had breached Mercor. In a Telegram account and on a BreachForums clone, the actor offered to sell an array of alleged Mercor data, including a 200-plus GB database, nearly 1 TB of source code, and 3 TBs of video and other information. But researchers say that many cybercriminal groups now periodically take up the Lapsus$ name and that Mercor's confirmation of the LiteLLM connection means that the attacker is likely TeamPCP or an actor connected to the group. TeamPCP appears to have compromised the two LiteLLM updates as part of an even larger supply chain hacking spree in recent months that has been gaining momentum, catapulting TeamPCP to prominence. And while launching data extortion attacks and working with ransomware groups, such as the group known as Vect, TeamPCP has also strayed into political territory, spreading a data wiping worm known as "CanisterWorm" through vulnerable cloud instances with Farsi as their default language or clocks set to Iran's time zone. "TeamPCP is definitely financially motivated," says Allan Liska, an analyst for the security firm Recorded Future who specializes in ransomware. "There might be some geopolitical stuff as well, but it's hard to determine what's real and what's bluster, especially with a group this new." Looking at the dark-web posts of the alleged Mercor data, Liska adds, "There is absolutely nothing that connects this to the original Lapsus$."
Labelbox, a data factory trusted by top AI labs, has acquired Upcraft, an AI-powered sales automation startup founded in 2021. The acquisition will enhance how Labelbox scales its Alignerr network of over 1 million domain experts who train and evaluate advanced AI models. Upcraft's AI agent technology will be integrated into Labelbox's infrastructure to automate expert recruitment and engagement workflows. The Chicago-based company specialises in automating sales outreach, qualification and engagement processes. "Upcraft's AI agent expertise will transform how we grow and operate the Alignerr network, enabling us to deliver the high-quality, expert-driven training data that defines the cutting edge of AI," said Manu Sharma, Labelbox CEO. Acquisition terms were not disclosed. Labelbox is backed by SoftBank, Andreessen Horowitz and Kleiner Perkins.
Labelbox acquires agentic sales automation startup, Upcraft, to rapidly scale the human expertise powering frontier AI. * By Labelbox * Feb 10, 2026 SAN FRANCISCO, Feb. 10, 2026 /PRNewswire/ - Labelbox, the leading data factory trusted by top AI labs and enterprises, has acquired Upcraft, a pioneer in AI-powered sales automation. This acquisition will enhance how Labelbox scales outreach and engagement within Alignerr, its network of over 1 million domain experts who evaluate, train, and improve the world's most advanced AI models with their expertise. The combination integrates Upcraft's AI agent technology into Labelbox's infrastructure, enabling automated workflows that accelerate the delivery of expert-quality training data at scale. Founded in 2021, Upcraft has built AI agents that automate complex sales workflows. The team will apply this expertise to revolutionize how Alignerr interacts with domain experts generating training data for AI models. "After nearly five years building Upcraft, we're thrilled to bring our AI sales agent expertise to Labelbox," said Greg Caplan, Co-founder and CEO of Upcraft. "Labelbox's vision of helping the world's largest AI labs and hyperscalers advance superintelligence is inspiring. This acquisition lets us contribute to a platform with unmatched resources and reach, accelerating our mission to make AI more accessible and effective. Leading growth for Alignerr's expert ecosystem is particularly exciting. By applying the latest AI agent technology to engage experts more effectively, we can generate higher-quality data that improves the world's most capable AI models and unlocks their full potential. I'm deeply grateful to our investors, partners, and team for their support, and I'm excited for what lies ahead." "Building frontier AI requires connecting elite domain experts with development teams at scale," said Manu Sharma, CEO of Labelbox. "Upcraft's AI agent expertise will transform how we grow and operate the Alignerr network, enabling us to deliver the high-quality, expert-driven training data that defines the cutting edge of AI. We're excited to welcome Greg and the team." The acquisition reflects the growing competition among AI companies to secure differentiated expert generated training data, a critical input as models advance toward sophisticated reasoning and domain specific capabilities. By automating expert recruitment and engagement, Labelbox aims to maintain its leadership position as AI labs invest billions in post training and reinforcement learning workflows. Terms of the acquisition were not disclosed. About Labelbox Labelbox is the leading data factory for frontier model development. Trusted by over 80% of leading AI labs in the US and hundreds of enterprises worldwide, Labelbox provides integrated software, managed services, and an expert network that enable organizations to create the high-quality training data required for breakthrough AI systems. Headquartered in San Francisco, the company is backed by leading investors including SoftBank, Andreessen Horowitz, B Capital, Gradient Ventures, Databricks Ventures, and Kleiner Perkins. About Upcraft Founded in 2021 and headquartered in Chicago, Upcraft builds AI-powered sales agents that automate outreach, qualification, and engagement workflows. The company's technology enables sales teams to scale personalized communication while reducing operational overhead. SOURCE Labelbox
That's why Labelbox Inc. is excited to announce the Labelbox Evaluation Studio, a private, real-time evaluation platform built for AI labs and model development teams.
👩🍳 How we use AI at Tech in Asia, thoughtfully and responsibly.🧔♂️ A friendly human may check it before it goes live. More news hereChinese AI lab DeepSeek has released an updated reasoning model, R1-0528, which is reported to perform well in math and coding benchmarks.However, concerns have been raised regarding the potential use of data from Google’s Gemini AI family in training this model.Developer Sam Paech, based in Melbourne, shared evidence on social media indicating that R1-0528 shows similarities to Google’s Gemini 2.5 Pro.Another developer, known for creating SpeechMap, also noted that the reasoning patterns of R1-0528 resemble those of Gemini AI.DeepSeek has not disclosed the sources of data used for training the model.🔗 Source: TechCrunch🧠 Food for thought1️⃣ Model distillation creates an ethical gray area amid fierce AI competitionDistillation, the process of training smaller models using outputs from larger ones, has become a contentious but widespread practice in AI development, especially for companies with limited computing resources.While distillation itself is a legitimate technique, DeepSeek’s alleged use of competitors’ models highlights the intellectual property challenges in AI development, with previous accusations suggesting they used OpenAI’s outputs without authorization1.This case illustrates a technical reality: companies like DeepSeek, which are “short on GPUs and flush with cash,” may find it economically rational to create synthetic data from competitors’ models rather than building everything from scratch2.The increasing adoption of protective measures by major AI labs, such as OpenAI requiring ID verification from countries that exclude China or Google summarizing model traces, demonstrates how seriously these companies view the threat of unauthorized knowledge transfer3.These protective measures reflect a broader industry recognition that model weights represent the culmination of substantial investments, making them valuable intellectual property worth safeguarding4.2️⃣ AI contamination creates attribution challenges for researchers and companiesThe difficulty in definitively proving model copying stems partly from the growing “contamination” of the open web with AI-generated content, making it increasingly challenging to determine a model’s true training sources.As content farms flood the internet with AI-generated text and bots populate platforms like Reddit and X, the lines between human-created content and AI outputs are blurring, complicating efforts to create “clean” training datasets5.This contamination means that similar word choices and expression patterns across different models might simply reflect training on the same AI-generated web content rather than direct copying6.The challenges of attribution are further complicated by the fact that many models naturally converge on similar linguistic patterns due to shared training methodologies and objectives, making it difficult to establish definitive evidence of unauthorized distillation7.These attribution difficulties create significant implications for intellectual property protection in AI, as companies struggle to determine whether similarities between models indicate legitimate convergence or improper copying1.3️⃣ AI security measures signal a shift from open collaboration to competitive protectionThe increasing implementation of security measures by AI labs reflects a significant shift in the industry from open collaboration toward protecting competitive advantages in a high-stakes technological race.Major AI companies are implementing increasingly sophisticated protections, such as OpenAI requiring ID verification, Google “summarizing” model traces, and Anthropic explicitly protecting “competitive advantages,” signaling a new phase of AI development where knowledge protection trumps open sharing8.This defensive posture is emerging in a context where the stakes are enormous. Training a single large AI model can cost millions in computing resources and produce emissions equivalent to five cars’ lifetimes, making the intellectual property extremely valuable9.These protective measures are particularly notable in the context of international AI competition, with some U.S. legislators even proposing criminal penalties for downloading certain Chinese AI models like DeepSeek, highlighting the geopolitical dimensions of AI development10.The tension between collaboration and protection reflects a maturing AI industry where companies increasingly view their training methodologies and model capabilities as critical competitive assets rather than academic research to be openly shared3.Recent DeepSeek developments