Full-Time

Software Engineer

Backend

Posted on 8/20/2024

Snorkel AI

Snorkel AI

501-1,000 employees

Transforms manual AI processes into programmatic solutions

Compensation Overview

$200k - $240k/yr

+ Equity Compensation

Junior, Mid

San Francisco, CA, USA + 1 more

More locations: San Carlos, CA, USA

Hybrid

Work a hybrid schedule with three days per week in our Redwood City HQ or the SF office and work remotely with 'No Meeting' Tuesdays and Thursdays.

Category
Backend Engineering
Software Engineering
Required Skills
Python
Machine Learning
Data Analysis
Requirements
  • Bachelor's degree in Computer Science or related field
  • 2 years experience in delivering distributed and ML systems and services in a production setting for cloud-native applications
  • Experience with distributed compute frameworks and data processing pipelines
  • Ability to design and build efficient scalable data storage, compute, and retrieval systems for AI/ML tasks
  • Strong communication and coding skills with emphasis on designing for scale and robustness
  • Strong development and debugging skills in python
Responsibilities
  • Own the architecture, design, development, and operations of large-scale systems designed for AI/ML tasks including distributed compute systems, data management systems, data engineering workflow systems, and end user experiences
  • Help create 'startup within a startup'
  • Prototype, optimize, and maintain scalable back-end services that will power new ML and foundation model powered development workflows
  • Design extensible and testable interfaces between internal services including the underlying storage and data models
  • Keep CI/CD pipelines healthy and support customers in production via engaged on-call support
  • Be an engaged team player in a customer-focused cross-functional environment where you will feel excited to take on whatever is most impactful for the company and product
Desired Qualifications
  • 6 years of professional software engineering experience
  • Experience with architecting and developing production web-scale systems (monitoring, telemetry, performance, reliability, triage and debug)
  • Experience working with ML systems and foundation models (e.g. large language models)
  • Experience owning delivery of large multi-person multi-quarter projects
  • Experience developing enterprise software products for machine learning and/or data science applications

Snorkel AI focuses on enhancing AI development by transforming manual processes into programmatic solutions, allowing businesses to create AI systems that are specifically designed for their unique workloads at a much faster pace. Their technology, which originated from research at the Stanford AI lab, is used by major organizations including five of the top ten US banks, various government agencies, and numerous Fortune 500 companies. Snorkel AI's products work by providing proprietary data and knowledge that streamline the deployment of AI, making it easier for enterprises to implement AI solutions. Unlike many competitors, Snorkel AI emphasizes a programmatic approach that reduces the time and effort needed to develop AI applications. The company's goal is to empower organizations to leverage AI effectively and efficiently, ultimately accelerating their digital transformation.

Company Size

501-1,000

Company Stage

Series D

Total Funding

$235M

Headquarters

Redwood City, California

Founded

2019

Simplify Jobs

Simplify's Take

What believers are saying

  • Snorkel AI raised $100 million in Series D funding at a $1.3 billion valuation.
  • The company serves five of the top ten US banks and various government agencies.
  • Snorkel AI's new product offerings enhance AI development from prototype to production.

What critics are saying

  • Emerging competition from companies like DeepSeek poses a risk to Snorkel AI.
  • AI-generated content contamination complicates clean training dataset creation.
  • Rapid expansion may pose integration and strategic alignment challenges for Snorkel AI.

What makes Snorkel AI unique

  • Snorkel AI uses programmatic data labeling to accelerate AI development.
  • The platform transforms proprietary data into AI-ready datasets efficiently.
  • Snorkel AI's technology originated from Stanford AI Lab research.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health - Snorkelers and their dependents are covered by comprehensive medical, dental, and vision plans.

Environment - We provide an allowance for Snorkelers to set up workstations however they want.

Wellness - Snorkelers are given a yearly wellness stipend to be used on anything relating to health and well-being.

Growth & Insights and Company News

Headcount

6 month growth

24%

1 year growth

2%

2 year growth

0%
Tech in Asia
Jun 4th, 2025
Deepseek’S New Ai Model May Be Trained On Google’S Gemini

👩‍🍳 How we use AI at Tech in Asia, thoughtfully and responsibly.🧔‍♂️ A friendly human may check it before it goes live. More news hereChinese AI lab DeepSeek has released an updated reasoning model, R1-0528, which is reported to perform well in math and coding benchmarks.However, concerns have been raised regarding the potential use of data from Google’s Gemini AI family in training this model.Developer Sam Paech, based in Melbourne, shared evidence on social media indicating that R1-0528 shows similarities to Google’s Gemini 2.5 Pro.Another developer, known for creating SpeechMap, also noted that the reasoning patterns of R1-0528 resemble those of Gemini AI.DeepSeek has not disclosed the sources of data used for training the model.🔗 Source: TechCrunch🧠 Food for thought1️⃣ Model distillation creates an ethical gray area amid fierce AI competitionDistillation, the process of training smaller models using outputs from larger ones, has become a contentious but widespread practice in AI development, especially for companies with limited computing resources.While distillation itself is a legitimate technique, DeepSeek’s alleged use of competitors’ models highlights the intellectual property challenges in AI development, with previous accusations suggesting they used OpenAI’s outputs without authorization1.This case illustrates a technical reality: companies like DeepSeek, which are “short on GPUs and flush with cash,” may find it economically rational to create synthetic data from competitors’ models rather than building everything from scratch2.The increasing adoption of protective measures by major AI labs, such as OpenAI requiring ID verification from countries that exclude China or Google summarizing model traces, demonstrates how seriously these companies view the threat of unauthorized knowledge transfer3.These protective measures reflect a broader industry recognition that model weights represent the culmination of substantial investments, making them valuable intellectual property worth safeguarding4.2️⃣ AI contamination creates attribution challenges for researchers and companiesThe difficulty in definitively proving model copying stems partly from the growing “contamination” of the open web with AI-generated content, making it increasingly challenging to determine a model’s true training sources.As content farms flood the internet with AI-generated text and bots populate platforms like Reddit and X, the lines between human-created content and AI outputs are blurring, complicating efforts to create “clean” training datasets5.This contamination means that similar word choices and expression patterns across different models might simply reflect training on the same AI-generated web content rather than direct copying6.The challenges of attribution are further complicated by the fact that many models naturally converge on similar linguistic patterns due to shared training methodologies and objectives, making it difficult to establish definitive evidence of unauthorized distillation7.These attribution difficulties create significant implications for intellectual property protection in AI, as companies struggle to determine whether similarities between models indicate legitimate convergence or improper copying1.3️⃣ AI security measures signal a shift from open collaboration to competitive protectionThe increasing implementation of security measures by AI labs reflects a significant shift in the industry from open collaboration toward protecting competitive advantages in a high-stakes technological race.Major AI companies are implementing increasingly sophisticated protections, such as OpenAI requiring ID verification, Google “summarizing” model traces, and Anthropic explicitly protecting “competitive advantages,” signaling a new phase of AI development where knowledge protection trumps open sharing8.This defensive posture is emerging in a context where the stakes are enormous. Training a single large AI model can cost millions in computing resources and produce emissions equivalent to five cars’ lifetimes, making the intellectual property extremely valuable9.These protective measures are particularly notable in the context of international AI competition, with some U.S. legislators even proposing criminal penalties for downloading certain Chinese AI models like DeepSeek, highlighting the geopolitical dimensions of AI development10.The tension between collaboration and protection reflects a maturing AI industry where companies increasingly view their training methodologies and model capabilities as critical competitive assets rather than academic research to be openly shared3.Recent DeepSeek developments

Crunchbase
May 30th, 2025
The Week's Biggest Funding Rounds: Another Billion-Dollar AI Raise Leads List That Includes Lots Of Biotech And More AI

ClickHouse, $350M, analytics: ClickHouse, a provider of analytics, data warehousing and machine learning technology, raised $350 million in a Series C financing led by Khosla Ventures.

Business Wire
May 30th, 2025
Snorkel AI Announces $100 Million Series D and Expanded Platform to Power Next Phase of AI with Expert Data

Today, Snorkel AI announced general availability of two new product offerings on the Snorkel AI Data Development Platform: Snorkel Evaluate and Snorkel Exper...

Bitvoxy
May 30th, 2025
Snorkel AI Raises $100 Million at $1.3 Billion Valuation to Accelerate Specialized AI Deployment

Snorkel AI raises $100 million at $1.3 billion valuation to accelerate specialized AI deployment.

Bakersfield.com
May 29th, 2025
Snorkel AI Secures $100M Series D Funding

Snorkel AI announced the launch of Snorkel Evaluate and Snorkel Expert Data-as-a-Service on its Data Development Platform, aimed at enhancing AI development from prototype to production. Additionally, Snorkel AI secured $100 million in Series D funding at a $1.3 billion valuation, led by Addition. The funding will support further research and innovation in specialized AI systems using expert data.

INACTIVE