Work Here?

Patronus AI

Work Here?

Claim Your Company

AI safety tools for secure adoption

Website

Patronus AI

Work Here?

Claim Your Company

AI safety tools for secure adoption

Website

Overview

Patronus AI builds tools that help businesses and developers use artificial intelligence safely and with confidence. Its product suite focuses on AI safety and trustworthy AI, guiding clients through risks and governance needs as they adopt AI in their operations. The tools are provided through a subscription service, offering adaptable, proactive security and risk-management capabilities that clients can scale with their usage. Patronus AI differentiates itself by prioritizing customer obsession, a growth mindset, and a culture of kindness, aiming to form long-term partnerships rather than one-off engagements. Overall, the company’s goal is to enable organizations to integrate AI into their processes securely and effectively while staying ahead of evolving AI challenges.

Funded Recently

About Patronus AI

Simplify's Rating

Why Patronus AI is rated

C+

Rated C on Competitive Edge

Rated B on Growth Potential

Rated C on Differentiation

Industries

Enterprise Software

AI & Machine Learning

Company Size

51-200

Company Stage

Series B

Total Funding

$70M

Headquarters

New York City, New York

Founded

2023

People at Patronus AI

People at Patronus AI who can refer or advise you

Simplify's Take

What believers are saying

Revenue grew 15x in one year as Patronus serves most leading frontier AI labs and hyperscalers globally.
Patronus secured $50 million Series B funding led by Greenfield Partners to advance Digital World Models and simulation infrastructure.
Open-source Glider model outperforms GPT-4o-mini on AI benchmarks while providing detailed bullet-point reasoning for cost-effective evaluation.

What critics are saying

Open-source frameworks like DeepEval offer 50+ free metrics, enabling enterprises to bypass Patronus paid subscriptions within 6 to 12 months.
Anthropic and Google embed autonomous agent self-evaluation into foundational models, eliminating third-party stress-testing needs within 12 to 18 months.
AI labs hire Meta and Google veterans to build proprietary simulation stacks identical to Digital World Models, making $50 million R&D redundant within 9 to 15 months.

What makes Patronus AI unique

Patronus AI builds Digital World Models, language-diffusion simulators that replicate real websites for stress-testing AI agents.
Percival is the first tool to automatically detect 20+ failure modes in agentic systems and suggest specific optimizations.
Judge-Image is the industry's first multimodal LLM-as-a-Judge, powered by Google Gemini to verify image caption accuracy.

Help us improve and share your feedback! Did you find this helpful?

Funding

Total Funding

$70M

Above

Industry Average

Funded Over

3 Rounds

Notable Investors:

Lightspeed Venture Partners

Notable Investors:

Lightspeed Venture Partners

Series B funding is typically for startups that have proven their business model and need more funding to expand rapidly—often by entering new markets or adding more products. Investors are usually venture capital firms that specialize in later-stage investments.

Series B Funding Comparison

Above Average

Industry standards

Ind Avg. $35M

$45M

$50M

$65M

$100M

Benefits

Health Insurance

Dental Insurance

Vision Insurance

401(k) Retirement Plan

Unlimited Paid Time Off

Growth & Insights and Company News

Headcount

6 month growth

↓ -8%

1 year growth

↓ -11%

2 year growth

↑ 0%

VentureBeat

May 14th, 2025

Patronus Ai Debuts Percival To Help Enterprises Monitor Failing Ai Agents At Scale

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Patronus AI launched a new monitoring platform today that automatically identifies failures in AI agent systems, targeting enterprise concerns about reliability as these applications grow more complex.The San Francisco-based AI safety startup’s new product, Percival, positions itself as the first solution capable of automatically identifying various failure patterns in AI agent systems and suggesting optimizations to address them.“Percival is the industry’s first solution that automatically detects a variety of failure patterns in agentic systems and then systematically suggests fixes and optimizations to address them,” said Anand Kannappan, CEO and co-founder of Patronus AI, in an exclusive interview with VentureBeat.AI agent reliability crisis: Why companies are losing control of autonomous systemsEnterprise adoption of AI agents—software that can independently plan and execute complex multi-step tasks—has accelerated in recent months, creating new management challenges as companies try to ensure these systems operate reliably at scale.Unlike conventional machine learning models, these agent-based systems often involve lengthy sequences of operations where errors in early stages can have significant downstream consequences.“A few weeks ago, we published a model that quantifies how likely agents can fail, and what kind of impact that might have on the brand, on customer churn and things like that,” Kannappan said. “There’s a constant compounding error probability with agents that we’re seeing.”This issue becomes particularly acute in multi-agent environments where different AI systems interact with one another, making traditional testing approaches increasingly inadequate.Episodic memory innovation: How Percival’s AI agent architecture revolutionizes error detectionPercival differentiates itself from other evaluation tools through its agent-based architecture and what the company calls “episodic memory” — the ability to learn from previous errors and adapt to specific workflows.The software can detect more than 20 different failure modes across four categories: reasoning errors, system execution errors, planning and coordination errors, and domain-specific errors.“Unlike an LLM as a judge, Percival itself is an agent and so it can keep track of all the events that have happened throughout the trajectory,” explained Darshan Deshpande, a researcher at Patronus AI. “It can correlate them and find these errors across contexts.”For enterprises, the most immediate benefit appears to be reduced debugging time. According to Patronus, early customers have reduced the time spent analyzing agent workflows from about one hour to between one and 1.5 minutes.TRAIL benchmark reveals critical gaps in AI oversight capabilitiesAlongside the product launch, Patronus is releasing a benchmark called TRAIL (Trace Reasoning and Agentic Issue Localization) to evaluate how well systems can detect issues in AI agent workflows.Research using this benchmark revealed that even sophisticated AI models struggle with effective trace analysis, with the best-performing system scoring only 11% on the benchmark.The findings underscore the challenging nature of monitoring complex AI systems and may help explain why large enterprises are investing in specialized tools for AI oversight.Enterprise AI leaders embrace Percival for mission-critical agent applicationsEarly adopters include Emergence AI, which has raised approximately $100 million in funding and is developing systems where AI agents can create and manage other agents.“Emergence’s recent breakthrough—agents creating agents—marks a pivotal moment not only in the evolution of adaptive, self-generating systems, but also in how such systems are governed and scaled responsibly,” said Satya Nitta, co-founder and CEO of Emergence AI, in a statement sent to VentureBeat.Nova, another early customer, is using the technology for a platform that helps large enterprises migrate legacy code through AI-powered SAP integrations.These customers typify the challenge Percival aims to solve

VentureBeat

Mar 13th, 2025

Patronus Ai’S Judge-Image Wants To Keep Ai Honest — And Etsy Is Already Using It

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Patronus AI announced today the launch of what it calls the industry’s first multimodal large language model-as-a-judge (MLLM-as-a-Judge), a tool designed to evaluate AI systems that interpret images and produce text.The new evaluation technology aims to help developers detect and mitigate hallucinations and reliability issues in multimodal AI applications. E-commerce giant Etsy has already implemented the technology to verify caption accuracy for product images across its marketplace of handmade and vintage goods.“Super excited to announce that Etsy is one of our ship customers,” said Anand Kannappan, cofounder of Patronus AI, in an exclusive interview with VentureBeat. “They have hundreds of millions of items in their online marketplace for handmade and vintage products that people are creating around the world. One of the things that their AI team wanted to be able to leverage generative AI for was the ability to auto-generate image captions and to make sure that as they scale across their entire global user base, that the captions that are generated are ultimately correct.”Why Google’s Gemini powers the new AI judge rather than OpenAIPatronus built its first MLLM-as-a-Judge, called Judge-Image, on Google’s Gemini model after extensive research comparing it with alternatives like OpenAI’s GPT-4V.“We tended to see that there was a slighter preference toward egocentricity with GPT-4V, whereas we saw that Gemini was less biased in those ways and had more of an equitable approach to being able to judge different kinds of input-output pairs,” Kannappan explained

PR Newswire

Mar 13th, 2025

Patronus Ai Launches Industry-First Multimodal Llm-As-A-Judge For Image Evaluation

E-commerce giant Etsy already leveraging technology to reduce AI hallucinations in product image captionsSAN FRANCISCO, March 13, 2025 /PRNewswire/ -- Patronus AI today announced the launch of the industry's first Multimodal LLM-as-a-Judge (MLLM-as-a-Judge), a groundbreaking evaluation capability that enables developers to score and optimize multimodal AI systems for image-to-text applications.The new Judge-Image tool, powered by Google Gemini, allows AI engineers to iteratively measure and improve the quality of their multimodal AI applications by scanning for text presence, grid structure, spatial orientation, and object identification."Our mission has always been to advance scalable oversight of AI," said Anand Kannappan, CEO and Co-founder of Patronus AI. "With the release of GPT-4o, Claude Opus, and Google's Gemini over the last year, organizations have invested heavily in image generation to drive customer value. However, as these AI experiences scale, so does the unpredictability of LLM systems. Our MLLM-as-a-Judge addresses this critical challenge by providing transparent, reliable evaluation of multimodal systems."The Judge-Image tool offers several out-of-box evaluation criteria, including:Caption hallucination detection (standard and strict)Primary and non-primary object description verificationObject location accuracyBeyond validating image caption correctness, Judge-Image can test OCR extraction accuracy for tabular data, AI-generated brand asset accuracy, and scene description validity.Prior research suggests that Google Gemini can serve as a more reliable MLLM judge compared to alternatives like OpenAI's GPT-4V, exhibiting less egocentricity and a more equitable approach to judgment. Patronus AI's internal evaluation datasets confirmed that the Gemini backbone performed better compared to other multimodal LLMs.Patronus AI plans to expand their multimodal evaluation capabilities to include audio and vision features in future releases.Customer Use CaseEtsy, the leading technology marketplace for independent sellers, has already implemented Patronus AI's MLLM-as-a-Judge to detect and mitigate caption hallucination from their product images. The Etsy AI team leverages this and the broader Patronus platform to optimize their multimodal AI system.For more information, visit the Patronus AI documentation at https://docs.patronus.ai/docs/multimodal_evals/base.About Patronus AIPatronus AI develops AI evaluation and optimization to help companies build top-tier AI products confidently

Yahoo Finance

Mar 13th, 2025

Patronus AI Launches Industry-First Multimodal LLM-as-a-Judge for Image Evaluation

Patronus AI launches industry-first Multimodal LLM-as-a-Judge for image evaluation.

TalkDev

Dec 20th, 2024

Patronus AI Unveils GLIDER: A Revolutionary Model for Transparent Language Evaluation

Patronus AI has released GLIDER, a 3.8 billion parameter model designed for evaluating language models.

Recently Posted Jobs

There are no jobs for Patronus AI right now.

Find jobs on Simplify and start your career today

We update Patronus AI's jobs every few hours, so check again soon! Browse all jobs →

About Patronus AI

Simplify's Rating

Why Patronus AI is rated

C+

Rated C on Competitive Edge

Rated B on Growth Potential

Rated C on Differentiation

Industries

Enterprise Software

AI & Machine Learning

Company Size

51-200

Company Stage

Series B

Total Funding

$70M

Headquarters

New York City, New York

Founded

2023

Recently Posted Jobs

There are no jobs for Patronus AI right now.

Find jobs on Simplify and start your career today

We update Patronus AI's jobs every few hours, so check again soon! Browse all jobs →