Patronus AI

Patronus AI

Provides AI safety tools and solutions

About Patronus AI

Simplify's Rating
Why Patronus AI is rated
A-
Rated B on Competitive Edge
Rated A on Growth Potential
Rated A on Differentiation

Industries

Enterprise Software

AI & Machine Learning

Company Size

11-50

Company Stage

Series A

Total Funding

$20M

Headquarters

New York City, New York

Founded

2023

Overview

Patronus AI provides tools that help businesses and organizations use artificial intelligence (AI) safely and confidently. Their products focus on AI safety, helping clients understand and manage the risks associated with AI technologies. The tools are designed to be adaptable, allowing clients to adjust their usage based on their specific needs. Patronus AI operates on a subscription-based model, where clients pay for access to these safety tools, ensuring a steady income for the company. What sets Patronus AI apart from competitors is its strong emphasis on customer relationships and a commitment to understanding client needs, which fosters trust and long-term partnerships. The company's goal is to create a trustworthy AI landscape, enabling clients to integrate AI into their operations effectively while prioritizing safety.

📈
Significant Headcount Growth
Simplify Jobs

Simplify's Take

What believers are saying

  • Patronus AI's tools address critical AI issues like hallucinations and agent reliability.
  • Subscription model ensures steady revenue and scalability for diverse client needs.
  • Innovative products like Lynx and Percival position Patronus AI as a market leader.

What critics are saying

  • Emerging competitors may offer more cost-effective AI safety solutions.
  • Dependence on Google's Gemini model could pose licensing risks.
  • Rapid AI advancements may outpace Patronus AI's current tool capabilities.

What makes Patronus AI unique

  • Patronus AI offers unique AI safety tools like Percival for agent reliability.
  • Their Judge-Image tool is the first multimodal LLM-as-a-Judge for image evaluation.
  • Glider model provides detailed AI decision explanations, enhancing trust in AI systems.

Help us improve and share your feedback! Did you find this helpful?

Funding

Total Funding

$20M

Meets

Industry Average

Funded Over

2 Rounds

Notable Investors:
Series A funding typically happens when a startup has a product and some customers, and now needs funding to scale. This money is usually used to grow the team, expand marketing, and improve the product. Venture capital firms are frequently the main investors here.
Series A Funding Comparison
Above Average

Industry standards

$15M
$8.2M
Discord
$15M
Canva
$17M
Patronus AI
$30M
Kalshi

Benefits

Health Insurance

Dental Insurance

Vision Insurance

401(k) Retirement Plan

Unlimited Paid Time Off

Growth & Insights and Company News

Headcount

6 month growth

6%

1 year growth

10%

2 year growth

0%
VentureBeat
May 14th, 2025
Patronus Ai Debuts Percival To Help Enterprises Monitor Failing Ai Agents At Scale

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Patronus AI launched a new monitoring platform today that automatically identifies failures in AI agent systems, targeting enterprise concerns about reliability as these applications grow more complex.The San Francisco-based AI safety startup’s new product, Percival, positions itself as the first solution capable of automatically identifying various failure patterns in AI agent systems and suggesting optimizations to address them.“Percival is the industry’s first solution that automatically detects a variety of failure patterns in agentic systems and then systematically suggests fixes and optimizations to address them,” said Anand Kannappan, CEO and co-founder of Patronus AI, in an exclusive interview with VentureBeat.AI agent reliability crisis: Why companies are losing control of autonomous systemsEnterprise adoption of AI agents—software that can independently plan and execute complex multi-step tasks—has accelerated in recent months, creating new management challenges as companies try to ensure these systems operate reliably at scale.Unlike conventional machine learning models, these agent-based systems often involve lengthy sequences of operations where errors in early stages can have significant downstream consequences.“A few weeks ago, we published a model that quantifies how likely agents can fail, and what kind of impact that might have on the brand, on customer churn and things like that,” Kannappan said. “There’s a constant compounding error probability with agents that we’re seeing.”This issue becomes particularly acute in multi-agent environments where different AI systems interact with one another, making traditional testing approaches increasingly inadequate.Episodic memory innovation: How Percival’s AI agent architecture revolutionizes error detectionPercival differentiates itself from other evaluation tools through its agent-based architecture and what the company calls “episodic memory” — the ability to learn from previous errors and adapt to specific workflows.The software can detect more than 20 different failure modes across four categories: reasoning errors, system execution errors, planning and coordination errors, and domain-specific errors.“Unlike an LLM as a judge, Percival itself is an agent and so it can keep track of all the events that have happened throughout the trajectory,” explained Darshan Deshpande, a researcher at Patronus AI. “It can correlate them and find these errors across contexts.”For enterprises, the most immediate benefit appears to be reduced debugging time. According to Patronus, early customers have reduced the time spent analyzing agent workflows from about one hour to between one and 1.5 minutes.TRAIL benchmark reveals critical gaps in AI oversight capabilitiesAlongside the product launch, Patronus is releasing a benchmark called TRAIL (Trace Reasoning and Agentic Issue Localization) to evaluate how well systems can detect issues in AI agent workflows.Research using this benchmark revealed that even sophisticated AI models struggle with effective trace analysis, with the best-performing system scoring only 11% on the benchmark.The findings underscore the challenging nature of monitoring complex AI systems and may help explain why large enterprises are investing in specialized tools for AI oversight.Enterprise AI leaders embrace Percival for mission-critical agent applicationsEarly adopters include Emergence AI, which has raised approximately $100 million in funding and is developing systems where AI agents can create and manage other agents.“Emergence’s recent breakthrough—agents creating agents—marks a pivotal moment not only in the evolution of adaptive, self-generating systems, but also in how such systems are governed and scaled responsibly,” said Satya Nitta, co-founder and CEO of Emergence AI, in a statement sent to VentureBeat.Nova, another early customer, is using the technology for a platform that helps large enterprises migrate legacy code through AI-powered SAP integrations.These customers typify the challenge Percival aims to solve

VentureBeat
Mar 13th, 2025
Patronus Ai’S Judge-Image Wants To Keep Ai Honest — And Etsy Is Already Using It

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Patronus AI announced today the launch of what it calls the industry’s first multimodal large language model-as-a-judge (MLLM-as-a-Judge), a tool designed to evaluate AI systems that interpret images and produce text.The new evaluation technology aims to help developers detect and mitigate hallucinations and reliability issues in multimodal AI applications. E-commerce giant Etsy has already implemented the technology to verify caption accuracy for product images across its marketplace of handmade and vintage goods.“Super excited to announce that Etsy is one of our ship customers,” said Anand Kannappan, cofounder of Patronus AI, in an exclusive interview with VentureBeat. “They have hundreds of millions of items in their online marketplace for handmade and vintage products that people are creating around the world. One of the things that their AI team wanted to be able to leverage generative AI for was the ability to auto-generate image captions and to make sure that as they scale across their entire global user base, that the captions that are generated are ultimately correct.”Why Google’s Gemini powers the new AI judge rather than OpenAIPatronus built its first MLLM-as-a-Judge, called Judge-Image, on Google’s Gemini model after extensive research comparing it with alternatives like OpenAI’s GPT-4V.“We tended to see that there was a slighter preference toward egocentricity with GPT-4V, whereas we saw that Gemini was less biased in those ways and had more of an equitable approach to being able to judge different kinds of input-output pairs,” Kannappan explained

PR Newswire
Mar 13th, 2025
Patronus Ai Launches Industry-First Multimodal Llm-As-A-Judge For Image Evaluation

E-commerce giant Etsy already leveraging technology to reduce AI hallucinations in product image captionsSAN FRANCISCO, March 13, 2025 /PRNewswire/ -- Patronus AI today announced the launch of the industry's first Multimodal LLM-as-a-Judge (MLLM-as-a-Judge), a groundbreaking evaluation capability that enables developers to score and optimize multimodal AI systems for image-to-text applications.The new Judge-Image tool, powered by Google Gemini, allows AI engineers to iteratively measure and improve the quality of their multimodal AI applications by scanning for text presence, grid structure, spatial orientation, and object identification."Our mission has always been to advance scalable oversight of AI," said Anand Kannappan, CEO and Co-founder of Patronus AI. "With the release of GPT-4o, Claude Opus, and Google's Gemini over the last year, organizations have invested heavily in image generation to drive customer value. However, as these AI experiences scale, so does the unpredictability of LLM systems. Our MLLM-as-a-Judge addresses this critical challenge by providing transparent, reliable evaluation of multimodal systems."The Judge-Image tool offers several out-of-box evaluation criteria, including:Caption hallucination detection (standard and strict)Primary and non-primary object description verificationObject location accuracyBeyond validating image caption correctness, Judge-Image can test OCR extraction accuracy for tabular data, AI-generated brand asset accuracy, and scene description validity.Prior research suggests that Google Gemini can serve as a more reliable MLLM judge compared to alternatives like OpenAI's GPT-4V, exhibiting less egocentricity and a more equitable approach to judgment. Patronus AI's internal evaluation datasets confirmed that the Gemini backbone performed better compared to other multimodal LLMs.Patronus AI plans to expand their multimodal evaluation capabilities to include audio and vision features in future releases.Customer Use CaseEtsy, the leading technology marketplace for independent sellers, has already implemented Patronus AI's MLLM-as-a-Judge to detect and mitigate caption hallucination from their product images. The Etsy AI team leverages this and the broader Patronus platform to optimize their multimodal AI system.For more information, visit the Patronus AI documentation at https://docs.patronus.ai/docs/multimodal_evals/base.About Patronus AIPatronus AI develops AI evaluation and optimization to help companies build top-tier AI products confidently

Yahoo Finance
Mar 13th, 2025
Patronus AI Launches Industry-First Multimodal LLM-as-a-Judge for Image Evaluation

Patronus AI launches industry-first Multimodal LLM-as-a-Judge for image evaluation.

TalkDev
Dec 20th, 2024
Patronus AI Unveils GLIDER: A Revolutionary Model for Transparent Language Evaluation

Patronus AI has released GLIDER, a 3.8 billion parameter model designed for evaluating language models.

Recently Posted Jobs

Sign up to get curated job recommendations

Patronus AI is Hiring for 6 Jobs on Simplify!

Find jobs on Simplify and start your career today

💡
Don't see your dream role? Check out thousands of other roles on Simplify. Browse all jobs →