Full-Time
Posted on 9/15/2025
Develops foundation models with subquadratic architectures
$180k - $250k/yr
San Francisco, CA, USA
In Person
| , |
Cartesia.ai develops advanced AI foundation models using new subquadratic architectures and state space models. Its models are designed as underlying systems that can be adapted for many applications and are released in part under open-source licenses (e.g., Apache 2.0), with licensing, partnerships, and consulting as revenue streams. The company’s products work by providing large, adaptable AI models that clients can license or customize for specific needs, enabling faster processing and efficient inference compared to traditional designs. Cartesia differentiates itself from competitors by using subquadratic, state space approaches instead of relying mainly on standard Transformer models, and by combining open-source releases with enterprise partnerships. The goal is to help businesses and research institutions access powerful, adaptable AI capabilities while building a community around its technology and sustaining revenue through licensing and services.
Company Size
51-200
Company Stage
Late Stage VC
Total Funding
$191M
Headquarters
San Francisco, California
Founded
2023
Help us improve and share your feedback! Did you find this helpful?
Health Insurance
Dental Insurance
Vision Insurance
401(k) Retirement Plan
401(k) Company Match
Relocation Assistance
Smallest.ai launches Lightning V3, a new text-to-speech model that beats OpenAI, Cartesia, and ElevenLabs on key voice quality benchmarks. Vmpl. * March 27, 2026 6:55 AM Smallest.ai Designed for real-time use, it combines multilingual speech, voice cloning from seconds of audio, and conversational-level prosody in a single system San Francisco, CA | March 27, 2026 - Smallest.ai, the research-first Voice AI company building proprietary speech models and production-grade voice agents, today announced the launch of Lightning V3, its most advanced text-to-speech (TTS) model for real-time, conversational AI. In conversational evaluations, Lightning V3 achieves a 3.89 MOS, outperforming leading models from OpenAI, Cartesia, and ElevenLabs, while also leading on intonation (3.33) and prosody (3.07)- two of the most critical factors for natural, human-like speech. The model combines this performance with multilingual support, instant voice cloning, and streaming generation designed for real-world interactions. Most TTS models today are still evaluated on complete sentences generated in isolation. That setup is easier to optimize for, but it doesn't reflect how voice systems actually behave in production- where audio is generated in chunks, context is incomplete, and responses have to adapt as conversations unfold. Lightning V3 is built for how voice systems actually run in production- generating speech in chunks, without full context, and adapting as conversations evolve. It maintains consistency across turns and adjusts tone and pacing mid-sentence, which is where most systems break down. That same setup allows the model to work across use cases without retraining- including voice agents, contact centers, podcasts, audiobooks, dubbing, and interactive applications. It supports 15 languages with automatic detection and mid-sentence switching, and can clone a voice from 5-15 seconds of audio. These cloned voices tend to sound more natural than preset ones, since they retain the variations of real speech. The model outputs audio at 44.1 kHz, and can be downsampled to 8-24 kHz for telephony. "Conversation is where most voice systems fall apart," said Sudarshan Kamath, Founder and CEO, Smallest.ai. "It's not just about sounding clear- the voice has to track context, timing, and emotion at the same time. If it works there, it works everywhere." A shift in how voice quality is measured. The launch also challenges how voice models are evaluated. Most benchmarks rely on static outputs- a setup that rarely reflects real usage. Lightning V3 is evaluated across these use case specific settings, measuring how well the voice maintains coherence, responsiveness, and believability throughout an interaction, in the given context of the conversation not just within a single utterance. Voices should be designed and judged in context: for whether they fit the persona they are meant to inhabit, carry the right social signal, and feel believable in the moment they were built for. Pricing. Lightning V3.1 is available on a pay-as-you-go model, with no upfront commitments, seat licenses, or minimum usage requirements. Teams can scale from early prototypes to high-volume deployments across both voice agents and content generation- with usage-based pricing and non-expiring credits. About Smallest.ai. Smallest.ai is a research-first Voice AI company building proprietary speech models and production-grade voice agents for regulated enterprises. The company develops state of the art speech-to-text, text-to-speech, and real-time voice systems, enabling end-to-end automation of high-volume conversations across support, collections, onboarding, and servicing- without relying on stitched third-party APIs. Designed for financial services and other regulated industries, Smallest.ai is SOC 2, GDPR, HIPAA, and PCI compliant, supports on-prem and private cloud deployments, and operates reliably in multilingual environments. Its platform is used in production by enterprises across banking, insurance, BPO and telecommunications in the US and India. Disclaimer: The content provided in this section is part of a third party press release service and does not reflect the editorial views or opinions of IANS. The responsibility for the accuracy, authenticity, and legality of the information lies solely with the content provider. IANS assumes no liability for the content published under this arrangement and encourages readers to verify the information independently before consuming it.
Cartesia has raised $100 million to develop its AI model, Sonic-3, which uses State Space Models (SSMs) instead of the widely used transformer architecture. This marks a significant shift in AI conversation models, as transformers, known for their attention mechanism, are commonly used in text, audio, and image recognition models.
Cartesia raises $100 million to transform real-time voice AI with Sonic-3. Silicon Valley startup Cartesia has secured a $100 million funding round from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Co-founded by Stanford AI Lab alumni Karan Goel and Albert Gu, Cartesia is launching Sonic-3, a real-time conversational AI model. Sonic-3 redefines what's possible in voice AI by delivering a combination of uniqueness, speed, and multilingual support. It captures the full emotional range of human speech, including laughter, tone variation, and subtle emotional shifts, making conversations feel deeply authentic and engaging. It also boasts lightning-fast performance, with a model latency of just 90 milliseconds and a total end-to-end response time of 190 milliseconds, placing it among the fastest real-time voice AI systems available. Its global reach is equally impressive, supporting 42 languages, enabling enterprises to deploy truly global, natural voice applications that meet diverse market needs. Unlike most voice AI solutions that rely on Transformer architectures, Sonic-3 is built on State Space Models (SSMs). The traditional Transformer-based models process conversations by re-reviewing all preceding dialogue to predict each next word, similar to replaying the entire conversation repeatedly. This approach introduces latency and inefficiency. SSMs, pioneered by Cartesia's founders at Stanford (with innovations like S4 and Mamba), function more like human memory. They retain an ongoing understanding of the topic and conversational vibe without replaying everything from scratch for each response. This enables Sonic-3 to generate speech that is both natural and fast. "If you're qualified and we can't make your voice AI better than what you're using now, I'll donate $5K to your chosen charity," said Karan. Thousands of companies, including ServiceNow, Cresta, and Decagon trust Sonic to power millions of voice interactions monthly. Cartesia's platform enables enterprises to build voice agents capable of complex tasks such as customer support, scheduling, and even lighthearted pranks, all with human-like expressiveness. To encourage adoption, Cartesia offers free trials and demos, as well as an 11-page guide on cloning voices and creating AI agents in under 10 minutes. Additionally, new users receive $100 in free credits to experiment with voice AI applications. The $100 million raise highlights growing investor confidence in Cartesia's technology and business potential. With capital from Silicon Valley titans like Kleiner Perkins and NVIDIA, Cartesia plans to expand its engineering team, scale product development, and extend its global reach. Want to advertise in AIM Media House? Book here > Global leaders, intimate gatherings, bold visions for AI. Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Poe‘s latest usage report shows OpenAI and Google strengthening their positions in key AI categories while Anthropic loses ground and specialized reasoning capabilities emerge as a crucial competitive battleground.According to data released today by Poe, a platform offering access to more than 100 AI models, significant market share shifts occurred across all major AI categories between January and May 2025. The data, drawn from Poe subscribers, provides rare visibility into actual user preferences beyond industry benchmarks.“As a universal gateway to 100+ AI models, Poe has a unique view of usage trends across the ecosystem,” said Nick Huber, Poe’s AI Ecosystem Lead, in an exclusive interview with VentureBeat. “The most surprising things happening right now are rapid innovation (3x the number of releases Jan-May 2025 vs. the same period in 2024), an increasingly diverse competitive landscape, and reasoning models are the clear success story of early 2025.”A chart from Poe showing AI model rankings across different categories as of May 2025
Forethought has joined forces with Cartesia, a leader in real-time voice AI, to enhance its voice AI agents and deliver high-quality conversational experiences.