Full-Time

Sales - Healthcare

Posted on 5/21/2025

Protege

Protege

51-200 employees

AI training data marketplace and licensing

No salary listed

Company Does Not Provide H1B Sponsorship

San Francisco, CA, USA + 2 more

More locations: Los Angeles, CA, USA | New York, NY, USA

Remote

Candidates are likely preferred to be based in major US cities such as NYC, LA, and the Bay area.

Category
Sales & Account Management (1)
Required Skills
Sales
Requirements
  • Drive and ability to get things done
  • Success as first sales hire or with closing deals in a fast-paced environment without many resources to help you
  • Ability to structure creative deals for win-win-wins
  • Familiarity with data in healthcare or the ability to learn it quickly
  • Ability to communicate complex information to diverse audiences clearly
  • You treat those around you with kindness
Responsibilities
  • Close deals with data buyers in healthcare, owning the end to end sales process from prospecting to negotiating through closing
  • Cultivate strategic partnerships to expand our network and data offerings
  • Help us continue to segment the market and define priorities among those segments
  • Collaborate with our CEO and Head of Operations to define the strategy and resourcing plan for the healthcare vertical
Desired Qualifications
  • (Nice to have) Success selling for B2B marketplace company and/or navigating channel conflict

Protege operates as a data marketplace that connects data holders with AI developers to enable secure, compliant exchange of training data for AI model development. Its platform streamlines data procurement by listing diverse and curated datasets, including specialized healthcare data, and facilitating data licensing deals between buyers and data owners. The acquisition of Calliope Networks adds premium video content, expanding into multimodal data to support generative AI models. Revenue likely comes from transaction fees on licensing deals or subscription access for data buyers. The goal is to provide ready access to high-quality, diverse training data while ensuring data security and regulatory compliance, helping AI teams accelerate model development.

Company Size

51-200

Company Stage

Series A

Total Funding

$65M

Headquarters

New York City, New York

Founded

2024

Simplify Jobs

Simplify's Take

What believers are saying

  • a16z-led $30M Series A extension on January 8, 2026, totals $65M funding.
  • Partnerships with OneMedNet and Shaip expand real-time healthcare datasets.
  • DataLab releases multimodal benchmarks advancing cancer diagnostics and audio de-identification.

What critics are saying

  • Scale AI captures Magnificent 7 clients with $1B-funded labeling in 6-12 months.
  • FTC HIPAA updates block 40% healthcare revenue from de-identified data in 12-18 months.
  • Snorkel AI enables synthetic data generation, disrupting real-world licensing in 6-12 months.

What makes Protege unique

  • Protege's DataLab brings scientific rigor to AI dataset design with Magnificent 7 collaborations.
  • Ethical revenue-sharing marketplace licenses proprietary multimodal data unlike labeling competitors.
  • Acquisition of Calliope Networks adds premium video for generative AI training.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Remote Work Options

Growth & Insights and Company News

Headcount

6 month growth

18%

1 year growth

5%

2 year growth

50%
HPCwire
Mar 12th, 2026
Protege Launches DataLab to Turn AI Data Into a Scientific Discipline

Protege launches DataLab to turn AI data into a scientific discipline. Press play to listen to this content At BigDATAwire, we have extensively covered how progress in AI is no longer driven only by larger models or more compute. Increasingly, the real constraint is the data itself, and that bottleneck becomes harder to solve as systems grow more complex and move closer to real-world use. Protege, an AI data platform focused on delivering large-scale, real-world datasets for model training, is building on that idea with the launch of DataLab, a new research initiative intended to bring more rigor, standards, and scientific discipline to the AI data layer. DataLab is essentially a research institution intended to help AI researchers navigate the growing challenges around the science of data, including dataset quality, selection, representation, and evaluation. The DataLab team includes in-house experts and researchers. Protege claims that most of the Magnificent 7 tech companies and multiple frontier AI labs are participating in early training and evaluation data collaborations. The Magnificent 7 includes Amazon, Apple, Alphabet, Microsoft, Nvidia, Meta, and Tesla. The launch of DataLab comes at a time when AI development is increasingly shaped by data limitations. A recent Snowflake survey found strong returns from GenAI projects but also highlighted ongoing challenges around data readiness and quality. This reinforces the idea that future progress will depend as much on better data as on larger models or more compute. "We understand the three core pillars driving AI: models, chips, and data. We are convinced that with the right datasets - the third, underdeveloped pillar - you can push the entire frontier forward," said Bobby Samuels, CEO of Protege. Samuels says that the company "created DataLab to treat data as infrastructure, not exhaust." He argues that to get more capable, reliable systems, you need to have better standards, reproducibility, and real scientific discipline at the data layer. The three core areas for DataLab include scientific partnerships, building high-value datasets and data products, and leading AI data research. The work is expected to span both academic research and commercial applications, with the lab exploring new product opportunities while also publishing benchmarks and technical studies. "The strength of DataLab is its ability to integrate perspectives that are often siloed," said Engy Ziedan, Co-Founder and Chief Scientific Officer at Protege. "Advancing AI requires more than larger models or more data alone." Ziedan emphasizes that this "requires thinking at the margin, where we weigh the marginal value of a datapoint on learning and the opportunity cost of choosing the wrong dataset." He believes that his team is structured to deliver the disciplined dataset design, careful evaluation, and a deep understanding of real-world complexity required for frontier AI. One reason efforts like DataLab are getting attention now is that AI systems are moving into areas where mistakes are harder to tolerate. Training a model on internet data is one thing. Training systems that help with scientific workflows require a very different level of precision. The challenge is not just finding more data, but finding the right data, understanding how it was collected, and knowing how it affects the outcome of a model. Researchers have also started to focus more on the marginal value of data, meaning how much a single dataset or even a single datapoint changes model behavior. As models become larger, simply adding more information does not always improve results. In some cases, the wrong data can reduce performance or introduce errors that are difficult to detect. That makes dataset design a technical problem in its own right rather than just a prep step before training. Protege says DataLab is meant to work at that level, where decisions about what data to include, how to structure it, and how to measure its impact can determine whether a system performs reliably outside the lab. As AI moves further into real-world use, that layer is becoming harder to ignore. If you want to read more stories like this and stay ahead of the curve in data and AI, subscribe to BigDataWire and follow us on LinkedIn. We deliver the insights, reporting, and breakthroughs that define the next era of technology. The latest Snowflake report titled "The ROI of Gen AI and Agents" shows that GenAI... The AI fatigue that defined the late 2023 and 2024 business cycles was, in hindsight,... MariaDB is set to acquire GridGain Systems with the aim to deliver sub-millisecond data performance... In 2008, scientists did something extraordinary. For the first time, they built the full genetic... Moving large-scale data across platforms, clouds, and global regions is no longer a special project... Deloitte's latest State of the AI report shows that AI adoption continues to accelerate rapidly,...

Business Wire
Mar 11th, 2026
Protege launches DataLab to advance AI data science with backing from Magnificent 7 companies

Protege, an AI data platform, has launched DataLab, a research institution focused on advancing the science of AI training and evaluation data. At launch, a majority of the "Magnificent 7" AI companies and major frontier AI labs are collaborating with DataLab on various projects. Led by Engy Ziedan, Protege's co-founder and chief scientific officer, DataLab addresses data quality, selection and methodology challenges as AI development becomes increasingly constrained by data limitations. The institution operates across three areas: scientific partnerships with AI researchers, building high-value datasets, and conducting AI data research. DataLab has already released multimodal healthcare benchmark datasets and is working with frontier AI organisations on challenges ranging from advanced cancer diagnostics to audio de-identification. The initiative aims to bring scientific rigour to dataset design and establish reproducible methodologies for more reliable AI systems.

SuperbCrew
Jan 9th, 2026
Protege Raises $30M in Series A Funding Led By a16z

Protege raises $30M in Series A funding led by a16z. Protege successfully raised $30 million in a Series A extension round led by Andreessen Horowitz (a16z), with participation from returning investors including Footwork, CRV, Bloomberg Beta, Flex Capital, and Shaper Capital, elevating the company's total funding to $65 million since its 2024 founding. Protege operates as an AI data platform that facilitates access to trusted, real world datasets at scale. Founded in 2024 by Bobby Samuels (CEO), Travis May (Chairman, with prior experience at Datavant and LiveRamp), Richard Ho (CTO), and Engy Ziedan (Chief Scientific Officer), the company emphasizes ethical sourcing of multimodal data, including de-identified health records, medical imaging, audio recordings, and media content. It curates datasets for AI training and evaluation, partnering with data providers through licensing agreements and offering revenue sharing models. By 2025, Protege had expanded its network to hundreds of organizations and supports workflows for leading AI institutions worldwide. This latest round builds on a $25 million Series A in August 2025 led by Footwork and a $10 million seed round led by CRV. The extension underscores rapid adoption in industries facing data shortages for AI development. a16z Partner Daisy Wolf noted that Protege's approach respects data complexities while enabling modern AI use, highlighting a market shift toward responsible data unlocking. CEO Bobby Samuels emphasized the platform's role in supplying curated, AI ready data amid fragmented sources, while Chairman Travis May pointed to proprietary data as the driver for AI's next phase. Protege's $30 million Series A extension round, led by Andreessen Horowitz (a16z) and announced on January 8, 2026, marks a significant milestone in the company's trajectory, bringing its total funding to $65 million since its inception in 2024. This funding builds directly on the momentum from Protege's prior raises, including a $25 million Series A in August 2025 led by Footwork and a $10 million seed round led by CRV, reflecting investor confidence in its mission to address one of AI's most pressing challenges: access to high quality, real world data. At its core, Protege functions as a governed marketplace and data infrastructure platform that connects data holders (such as organizations in healthcare, media, audio, and motion capture) with AI developers seeking proprietary, multimodal datasets for training, fine tuning, and evaluation. Founded by a seasoned team including CEO Bobby Samuels, Chairman Travis May (former CEO of Datavant and LiveRamp), CTO Richard Ho, and Chief Scientific Officer Engy Ziedan, the company has rapidly scaled its partner network to hundreds of organizations by 2025, emphasizing ethical licensing, data curation, anonymization, and revenue sharing models that compensate providers based on usage. This approach differentiates Protege from reliance on public or synthetic data, focusing instead on real world sources that capture authentic human and system behaviors across domains like video, imaging, gaming, manufacturing, life sciences, real estate, finance, and education. The funding round arrives amid a broader industry shift where AI progress is increasingly constrained by data availability rather than compute or model architecture. Public datasets have been largely exhausted, and the internet's scrapable content has reached its limits, pushing developers toward fragmented, proprietary sources that are often inaccessible due to privacy, intellectual property, and operational hurdles. Protege addresses these by streamlining discovery, filtering, and combination of datasets with built-in compliance and transparency, effectively shortening delivery timelines from years to months in sectors like healthcare. As Travis May articulated, "Access to data is the biggest bottleneck to the advancement of AI. The next phase of AI will be driven by real world, proprietary data generated through everyday human activity." Similarly, Bobby Samuels highlighted the demand supply imbalance: "We're seeing demand for real world data grow faster than the market's ability to supply it responsibly." Investor enthusiasm, particularly from a16z, stems from Protege's proven product market fit, as evidenced by its collaborations with foundational model builders, including the majority of the Magnificent Seven tech giants. Daisy Wolf of a16z remarked, "The next era of AI will be shaped by who can responsibly unlock access to the world's most valuable data," underscoring the platform's role in navigating complex data landscapes. The capital will fuel specific initiatives: accelerating product features for data cleaning and formatting, broadening coverage into new verticals, enhancing partnerships, and expanding the team across roles like data scientists, engineers, and operations. In the competitive landscape, Protege stands out by concentrating on data aggregation and ethical exchange, unlike broader AI tooling providers. Key competitors include Scale AI, which offers data labeling and annotation services; Snorkel AI, focused on programmatic data labeling for machine learning; and Labelbox, which provides tools for data labeling and management. These players address adjacent needs but lack Protege's emphasis on proprietary, real world data marketplaces with revenue sharing and compliance layers. Looking ahead, this funding positions Protege to influence AI's evolution, potentially shaping standards for data valuation, licensing, and ethical AI development, as seen in its participation in events like CES 2026 panels on AI copyright and data valuation. Reactions from the tech community, including congratulatory notes on LinkedIn and X, indicate strong support for Protege's vision amid predictions that media licensing for AI could reach hundreds of millions in revenue by 2026 alone. Funding History Table | Round | Amount Raised | Lead Investor | Date | Participating Investors | Cumulative Total | | Seed | $10 million | CRV | 2024 (exact date not specified) | Not detailed | $10 million | | Series A | $25 million | Footwork | August 2025 | CRV, Bloomberg Beta, Flex Capital, Shaper Capital | $35 million | | Series A Extension | $30 million | Andreessen Horowitz (a16z) | January 8, 2026 | Footwork, CRV, Bloomberg Beta, Flex Capital, Shaper Capital | $65 million | | Investor | Role in Latest Round | Notable Background/Contributions | | Andreessen Horowitz (a16z) | Lead | Focus on AI infrastructure; Partner Daisy Wolf emphasizes data's role in AI's future. | | Footwork | Returning Participant | Led previous Series A; Co-founder Nikhil Basu Trivedi supports data focused ventures. | | CRV | Returning Participant | Led seed round; Partner Saar Gur invests in enterprise tech and AI. | | Bloomberg Beta | Returning Participant | Early stage AI and data investments aligned with media and tech. | | Flex Capital | Returning Participant | Focus on scalable tech platforms. | | Shaper Capital | Returning Participant | Supports innovative data and AI startups. | | Company | Primary Focus | Key Differentiation from Protege | Funding/Scale Highlights | | Scale AI | Data labeling and annotation for AI models | Broader tooling including human in the loop labeling; less emphasis on proprietary data marketplaces. Raised over $1 billion; serves major AI firms. | | Snorkel AI | Programmatic data labeling and management | Focuses on weak supervision and custom labeling pipelines; not centered on real world data aggregation. $135 million in funding; enterprise oriented. | | Labelbox | Data labeling platform with collaboration tools | Emphasizes workflow for labeling teams; lacks revenue sharing for data providers. $188 million raised; integrates with ML frameworks. | This round not only validates Protege's model but also signals a maturing AI data ecosystem where ethical, scalable access could unlock trillions in economic value, though ongoing debates around data privacy and ownership will shape its long term impact.

SiliconANGLE Media
Jan 8th, 2026
Protege raises $30M to grow governed marketplace for AI training data

Protege raises $30M to grow governed marketplace for AI training data - SiliconANGLE

TechStartups.com
Aug 13th, 2025
Top Startup and Tech Funding News - August 13, 2025

August Health locked in $29 million to advance its AI-powered senior care platform, while AI data platform Protege raised $25 million in Series A funding.

INACTIVE