Full-Time
Provides data labeling solutions for AI
$130k - $260k/yr
Junior, Mid
San Francisco, CA, USA
Hybrid model with 2 days per week in office.
Labelbox offers data labeling solutions for artificial intelligence applications, helping businesses label images, videos, text, and documents efficiently. Their tools create workflows that manage labeling tasks, ensuring high-quality results for clients in industries like agriculture, healthcare, and technology. Operating on a software-as-a-service (SaaS) model, Labelbox generates revenue through subscription fees and additional workforce services. The company's goal is to enhance AI development by providing effective data labeling solutions that improve the efficiency and quality of AI model training.
Company Size
201-500
Company Stage
Series D
Total Funding
$188.9M
Headquarters
San Francisco, California
Founded
2018
Help us improve and share your feedback! Did you find this helpful?
Competitive remuneration
Flexible vacation policy (we don't count PTO Days)
401k Program
College savings account
HSA
Daily lunches paid for by the company (especially convenient while working from home)
Virtual wellness and guided meditation programs
Dog-friendly office
Regular company social events (happy hours, off-sites)
Professional development benefits and resources
Remote friendly (we hire in-office and remote employees)
👩🍳 How we use AI at Tech in Asia, thoughtfully and responsibly.🧔♂️ A friendly human may check it before it goes live. More news hereChinese AI lab DeepSeek has released an updated reasoning model, R1-0528, which is reported to perform well in math and coding benchmarks.However, concerns have been raised regarding the potential use of data from Google’s Gemini AI family in training this model.Developer Sam Paech, based in Melbourne, shared evidence on social media indicating that R1-0528 shows similarities to Google’s Gemini 2.5 Pro.Another developer, known for creating SpeechMap, also noted that the reasoning patterns of R1-0528 resemble those of Gemini AI.DeepSeek has not disclosed the sources of data used for training the model.🔗 Source: TechCrunch🧠 Food for thought1️⃣ Model distillation creates an ethical gray area amid fierce AI competitionDistillation, the process of training smaller models using outputs from larger ones, has become a contentious but widespread practice in AI development, especially for companies with limited computing resources.While distillation itself is a legitimate technique, DeepSeek’s alleged use of competitors’ models highlights the intellectual property challenges in AI development, with previous accusations suggesting they used OpenAI’s outputs without authorization1.This case illustrates a technical reality: companies like DeepSeek, which are “short on GPUs and flush with cash,” may find it economically rational to create synthetic data from competitors’ models rather than building everything from scratch2.The increasing adoption of protective measures by major AI labs, such as OpenAI requiring ID verification from countries that exclude China or Google summarizing model traces, demonstrates how seriously these companies view the threat of unauthorized knowledge transfer3.These protective measures reflect a broader industry recognition that model weights represent the culmination of substantial investments, making them valuable intellectual property worth safeguarding4.2️⃣ AI contamination creates attribution challenges for researchers and companiesThe difficulty in definitively proving model copying stems partly from the growing “contamination” of the open web with AI-generated content, making it increasingly challenging to determine a model’s true training sources.As content farms flood the internet with AI-generated text and bots populate platforms like Reddit and X, the lines between human-created content and AI outputs are blurring, complicating efforts to create “clean” training datasets5.This contamination means that similar word choices and expression patterns across different models might simply reflect training on the same AI-generated web content rather than direct copying6.The challenges of attribution are further complicated by the fact that many models naturally converge on similar linguistic patterns due to shared training methodologies and objectives, making it difficult to establish definitive evidence of unauthorized distillation7.These attribution difficulties create significant implications for intellectual property protection in AI, as companies struggle to determine whether similarities between models indicate legitimate convergence or improper copying1.3️⃣ AI security measures signal a shift from open collaboration to competitive protectionThe increasing implementation of security measures by AI labs reflects a significant shift in the industry from open collaboration toward protecting competitive advantages in a high-stakes technological race.Major AI companies are implementing increasingly sophisticated protections, such as OpenAI requiring ID verification, Google “summarizing” model traces, and Anthropic explicitly protecting “competitive advantages,” signaling a new phase of AI development where knowledge protection trumps open sharing8.This defensive posture is emerging in a context where the stakes are enormous. Training a single large AI model can cost millions in computing resources and produce emissions equivalent to five cars’ lifetimes, making the intellectual property extremely valuable9.These protective measures are particularly notable in the context of international AI competition, with some U.S. legislators even proposing criminal penalties for downloading certain Chinese AI models like DeepSeek, highlighting the geopolitical dimensions of AI development10.The tension between collaboration and protection reflects a maturing AI industry where companies increasingly view their training methodologies and model capabilities as critical competitive assets rather than academic research to be openly shared3.Recent DeepSeek developments
Google’s latest artificial intelligence models could accelerate AI adoption in eCommerce and retail, developers say, as the tech giant unveils upgrades designed to attract more businesses to its Gemini platform. The company announced two updated production-ready models in a Tuesday (Sept. 24) blog post, Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002, which offer enhanced capabilities across a range of tasks, including product recommendations, inventory management and customer service automation. “The new release introduces advanced capabilities in math and vision tasks,” Sujan Abraham, a senior software engineer at AI firm Labelbox, told PYMNTS. “These models are designed for a wide range of tasks, including text, code and multimodal applications. They can process larger and much more complex inputs like 1,000-page PDFs, massive code repos and hour-long videos
Labelbox introduces Large Language Model (LLM) solution to help enterprises innovate with generative AI, expands partnership with Google Cloud.
Labelbox introduces LLM solution to help enterprises innovate with generative AI, expands partnership with Google Cloud.
In the next week, Labelbox Inc.’ll be releasing auto-generated model metrics to debug your model, find and fix labeling errors, and improve the overall performance of your model before it hits production on real-world data.