Full-Time
Posted on 5/8/2025
AI development through programmatic solutions
$220k - $300k/yr
Senior, Expert
San Francisco, CA, USA + 1 more
More locations: New York, NY, USA
Candidates must be based in the US, specifically in New York or the San Francisco Bay Area.
Snorkel AI focuses on improving the development of artificial intelligence by transforming traditional manual processes into automated solutions. This allows businesses to create AI systems that are specifically designed for their unique needs much more quickly. Their technology is used by a wide range of clients, including major US banks, government agencies, and large corporations. Snorkel AI stands out from its competitors by offering proprietary data and knowledge that help speed up the deployment of AI solutions. The company's goal is to make AI development more efficient and accessible for enterprises, leveraging its roots in research from the Stanford AI lab.
Company Size
501-1,000
Company Stage
Series D
Total Funding
$235M
Headquarters
Redwood City, California
Founded
2019
Help us improve and share your feedback! Did you find this helpful?
Health - Snorkelers and their dependents are covered by comprehensive medical, dental, and vision plans.
Environment - We provide an allowance for Snorkelers to set up workstations however they want.
Wellness - Snorkelers are given a yearly wellness stipend to be used on anything relating to health and well-being.
👩🍳 How we use AI at Tech in Asia, thoughtfully and responsibly.🧔♂️ A friendly human may check it before it goes live. More news hereChinese AI lab DeepSeek has released an updated reasoning model, R1-0528, which is reported to perform well in math and coding benchmarks.However, concerns have been raised regarding the potential use of data from Google’s Gemini AI family in training this model.Developer Sam Paech, based in Melbourne, shared evidence on social media indicating that R1-0528 shows similarities to Google’s Gemini 2.5 Pro.Another developer, known for creating SpeechMap, also noted that the reasoning patterns of R1-0528 resemble those of Gemini AI.DeepSeek has not disclosed the sources of data used for training the model.🔗 Source: TechCrunch🧠 Food for thought1️⃣ Model distillation creates an ethical gray area amid fierce AI competitionDistillation, the process of training smaller models using outputs from larger ones, has become a contentious but widespread practice in AI development, especially for companies with limited computing resources.While distillation itself is a legitimate technique, DeepSeek’s alleged use of competitors’ models highlights the intellectual property challenges in AI development, with previous accusations suggesting they used OpenAI’s outputs without authorization1.This case illustrates a technical reality: companies like DeepSeek, which are “short on GPUs and flush with cash,” may find it economically rational to create synthetic data from competitors’ models rather than building everything from scratch2.The increasing adoption of protective measures by major AI labs, such as OpenAI requiring ID verification from countries that exclude China or Google summarizing model traces, demonstrates how seriously these companies view the threat of unauthorized knowledge transfer3.These protective measures reflect a broader industry recognition that model weights represent the culmination of substantial investments, making them valuable intellectual property worth safeguarding4.2️⃣ AI contamination creates attribution challenges for researchers and companiesThe difficulty in definitively proving model copying stems partly from the growing “contamination” of the open web with AI-generated content, making it increasingly challenging to determine a model’s true training sources.As content farms flood the internet with AI-generated text and bots populate platforms like Reddit and X, the lines between human-created content and AI outputs are blurring, complicating efforts to create “clean” training datasets5.This contamination means that similar word choices and expression patterns across different models might simply reflect training on the same AI-generated web content rather than direct copying6.The challenges of attribution are further complicated by the fact that many models naturally converge on similar linguistic patterns due to shared training methodologies and objectives, making it difficult to establish definitive evidence of unauthorized distillation7.These attribution difficulties create significant implications for intellectual property protection in AI, as companies struggle to determine whether similarities between models indicate legitimate convergence or improper copying1.3️⃣ AI security measures signal a shift from open collaboration to competitive protectionThe increasing implementation of security measures by AI labs reflects a significant shift in the industry from open collaboration toward protecting competitive advantages in a high-stakes technological race.Major AI companies are implementing increasingly sophisticated protections, such as OpenAI requiring ID verification, Google “summarizing” model traces, and Anthropic explicitly protecting “competitive advantages,” signaling a new phase of AI development where knowledge protection trumps open sharing8.This defensive posture is emerging in a context where the stakes are enormous. Training a single large AI model can cost millions in computing resources and produce emissions equivalent to five cars’ lifetimes, making the intellectual property extremely valuable9.These protective measures are particularly notable in the context of international AI competition, with some U.S. legislators even proposing criminal penalties for downloading certain Chinese AI models like DeepSeek, highlighting the geopolitical dimensions of AI development10.The tension between collaboration and protection reflects a maturing AI industry where companies increasingly view their training methodologies and model capabilities as critical competitive assets rather than academic research to be openly shared3.Recent DeepSeek developments
ClickHouse, $350M, analytics: ClickHouse, a provider of analytics, data warehousing and machine learning technology, raised $350 million in a Series C financing led by Khosla Ventures.
Today, Snorkel AI announced general availability of two new product offerings on the Snorkel AI Data Development Platform: Snorkel Evaluate and Snorkel Exper...
Snorkel AI raises $100 million at $1.3 billion valuation to accelerate specialized AI deployment.
Snorkel AI announced the launch of Snorkel Evaluate and Snorkel Expert Data-as-a-Service on its Data Development Platform, aimed at enhancing AI development from prototype to production. Additionally, Snorkel AI secured $100 million in Series D funding at a $1.3 billion valuation, led by Addition. The funding will support further research and innovation in specialized AI systems using expert data.