Full-Time

Staff Software Engineer

AI Platform

Confirmed live in the last 24 hours

Snorkel AI

Snorkel AI

501-1,000 employees

AI development through programmatic solutions

No salary listed

Senior, Expert

San Francisco, CA, USA + 1 more

More locations: San Carlos, CA, USA

Hybrid work environment with 3 days per week at our Redwood City HQ and SF Office.

Category
Backend Engineering
Software Engineering
Required Skills
LLM
Python
Pytorch
Requirements
  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, or a related field.
  • 8+ years of experience with a good portion of them in AI development, including hands-on work with AI in production systems.
  • Strong expertise in Python and deep learning frameworks such as PyTorch.
  • Proven experience leading technical projects and mentoring engineers.
  • Proficiency with CI/CD pipelines for machine learning workflows.
  • Deep understanding of LLM architectures, fine-tuning, and deployment methodologies.
  • Strong communication skills, with an emphasis on scalable and reliable system design.
Responsibilities
  • Provide technical direction for the design and development of AI pipelines, ensuring scalability, robustness, and extensibility.
  • Serve as a mentor and guide for engineers on the AI Platform team, fostering growth and technical excellence.
  • Identify and drive high-impact projects aligned with business objectives and product needs.
  • Architect, design, and maintain AI pipelines for labeling, embeddings, training, and deploying models into production.
  • Lead the development and optimization of MLFlow pipelines for deployment.
  • Build and deploy foundational models that serve as the backbone for SnorkelFlow’s core product capabilities.
  • Partner with the Compute Platform team to ensure seamless integration with orchestration tools and infrastructure.
  • Develop and deploy LLM-based systems for production workflows, focusing on efficiency, scalability, and reproducibility.
  • Create AI training framework pipelines that will allow LLM usage in applications, including fine-tuning, pruning, distillation, and foundational model training.
  • Integrate APIs from providers such as OpenAI, Anthropic, and Gemini into SnorkelFlow’s pipelines.
  • Oversee the integration of backend services for managing LLM calls and API interactions.
  • Collaborate with the Data Platform team to define data requirements and ensure smooth interoperability.
  • Work with the Application team to design and implement APIs that power application workflows.
  • Establish observability standards for AI pipelines, including tools and dashboards for monitoring model performance and debugging.
  • Define key metrics for system health and optimization.
  • Act as a thought leader, collaborating with Data Platform, Compute Platform, Application, Product and other internal teams to deliver cohesive, scalable solutions.
Desired Qualifications
  • Expertise in NLP and familiarity with libraries such as Hugging Face Transformers, spaCy, scikit-learn, or XGBoost.
  • Familiarity with multimodal AI concepts, including vision and audio tasks.
  • Experience working with APIs and foundational model providers such as OpenAI, Anthropic, or Gemini.
  • Knowledge of MLOps tools and practices, such as MLflow, Kubernetes, or Ray.
  • Experience building APIs or SDKs for AI services.

Snorkel AI focuses on enhancing AI development by transforming traditional manual processes into programmatic solutions. This allows businesses to create AI systems that are specifically designed for their unique workloads in a much shorter time frame. The company caters to a wide range of clients, including major US banks, government agencies, and Fortune 500 companies. Snorkel AI's approach is distinct because it leverages proprietary data and knowledge to speed up the deployment of AI technologies. Their technology, which originated from research at Stanford's AI lab, is already in use by prominent organizations like Google, Intel, and IBM. The company generates revenue through contracts and partnerships with enterprises, and it is based in Palo Alto, supported by notable investors.

Company Size

501-1,000

Company Stage

Series D

Total Funding

$235M

Headquarters

Redwood City, California

Founded

2019

Simplify Jobs

Simplify's Take

What believers are saying

  • Snorkel AI raised $100 million in Series D funding at a $1.3 billion valuation.
  • The company serves five of the top ten US banks and various government agencies.
  • Snorkel AI's new product offerings enhance AI development from prototype to production.

What critics are saying

  • Emerging competition from companies like DeepSeek poses a risk to Snorkel AI.
  • AI-generated content contamination complicates clean training dataset creation.
  • Rapid expansion may pose integration and strategic alignment challenges for Snorkel AI.

What makes Snorkel AI unique

  • Snorkel AI uses programmatic data labeling to accelerate AI development.
  • The platform transforms proprietary data into AI-ready datasets efficiently.
  • Snorkel AI's technology originated from Stanford AI Lab research.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health - Snorkelers and their dependents are covered by comprehensive medical, dental, and vision plans.

Environment - We provide an allowance for Snorkelers to set up workstations however they want.

Wellness - Snorkelers are given a yearly wellness stipend to be used on anything relating to health and well-being.

Growth & Insights and Company News

Headcount

6 month growth

15%

1 year growth

0%

2 year growth

-4%
Tech in Asia
Jun 4th, 2025
Deepseek’S New Ai Model May Be Trained On Google’S Gemini

👩‍🍳 How we use AI at Tech in Asia, thoughtfully and responsibly.🧔‍♂️ A friendly human may check it before it goes live. More news hereChinese AI lab DeepSeek has released an updated reasoning model, R1-0528, which is reported to perform well in math and coding benchmarks.However, concerns have been raised regarding the potential use of data from Google’s Gemini AI family in training this model.Developer Sam Paech, based in Melbourne, shared evidence on social media indicating that R1-0528 shows similarities to Google’s Gemini 2.5 Pro.Another developer, known for creating SpeechMap, also noted that the reasoning patterns of R1-0528 resemble those of Gemini AI.DeepSeek has not disclosed the sources of data used for training the model.🔗 Source: TechCrunch🧠 Food for thought1️⃣ Model distillation creates an ethical gray area amid fierce AI competitionDistillation, the process of training smaller models using outputs from larger ones, has become a contentious but widespread practice in AI development, especially for companies with limited computing resources.While distillation itself is a legitimate technique, DeepSeek’s alleged use of competitors’ models highlights the intellectual property challenges in AI development, with previous accusations suggesting they used OpenAI’s outputs without authorization1.This case illustrates a technical reality: companies like DeepSeek, which are “short on GPUs and flush with cash,” may find it economically rational to create synthetic data from competitors’ models rather than building everything from scratch2.The increasing adoption of protective measures by major AI labs, such as OpenAI requiring ID verification from countries that exclude China or Google summarizing model traces, demonstrates how seriously these companies view the threat of unauthorized knowledge transfer3.These protective measures reflect a broader industry recognition that model weights represent the culmination of substantial investments, making them valuable intellectual property worth safeguarding4.2️⃣ AI contamination creates attribution challenges for researchers and companiesThe difficulty in definitively proving model copying stems partly from the growing “contamination” of the open web with AI-generated content, making it increasingly challenging to determine a model’s true training sources.As content farms flood the internet with AI-generated text and bots populate platforms like Reddit and X, the lines between human-created content and AI outputs are blurring, complicating efforts to create “clean” training datasets5.This contamination means that similar word choices and expression patterns across different models might simply reflect training on the same AI-generated web content rather than direct copying6.The challenges of attribution are further complicated by the fact that many models naturally converge on similar linguistic patterns due to shared training methodologies and objectives, making it difficult to establish definitive evidence of unauthorized distillation7.These attribution difficulties create significant implications for intellectual property protection in AI, as companies struggle to determine whether similarities between models indicate legitimate convergence or improper copying1.3️⃣ AI security measures signal a shift from open collaboration to competitive protectionThe increasing implementation of security measures by AI labs reflects a significant shift in the industry from open collaboration toward protecting competitive advantages in a high-stakes technological race.Major AI companies are implementing increasingly sophisticated protections, such as OpenAI requiring ID verification, Google “summarizing” model traces, and Anthropic explicitly protecting “competitive advantages,” signaling a new phase of AI development where knowledge protection trumps open sharing8.This defensive posture is emerging in a context where the stakes are enormous. Training a single large AI model can cost millions in computing resources and produce emissions equivalent to five cars’ lifetimes, making the intellectual property extremely valuable9.These protective measures are particularly notable in the context of international AI competition, with some U.S. legislators even proposing criminal penalties for downloading certain Chinese AI models like DeepSeek, highlighting the geopolitical dimensions of AI development10.The tension between collaboration and protection reflects a maturing AI industry where companies increasingly view their training methodologies and model capabilities as critical competitive assets rather than academic research to be openly shared3.Recent DeepSeek developments

Crunchbase
May 30th, 2025
The Week's Biggest Funding Rounds: Another Billion-Dollar AI Raise Leads List That Includes Lots Of Biotech And More AI

ClickHouse, $350M, analytics: ClickHouse, a provider of analytics, data warehousing and machine learning technology, raised $350 million in a Series C financing led by Khosla Ventures.

Business Wire
May 30th, 2025
Snorkel AI Announces $100 Million Series D and Expanded Platform to Power Next Phase of AI with Expert Data

Today, Snorkel AI announced general availability of two new product offerings on the Snorkel AI Data Development Platform: Snorkel Evaluate and Snorkel Exper...

Bitvoxy
May 30th, 2025
Snorkel AI Raises $100 Million at $1.3 Billion Valuation to Accelerate Specialized AI Deployment

Snorkel AI raises $100 million at $1.3 billion valuation to accelerate specialized AI deployment.

Bakersfield.com
May 29th, 2025
Snorkel AI Secures $100M Series D Funding

Snorkel AI announced the launch of Snorkel Evaluate and Snorkel Expert Data-as-a-Service on its Data Development Platform, aimed at enhancing AI development from prototype to production. Additionally, Snorkel AI secured $100 million in Series D funding at a $1.3 billion valuation, led by Addition. The funding will support further research and innovation in specialized AI systems using expert data.