Full-Time
APIs for fast, scalable speech transcription
No salary listed
Remote in UK + 3 more
More locations: Remote in Germany | Remote in Ireland | Remote in Spain
Remote
Remote across EMEA with periodic regional meetups.
| , |
Deepgram provides AI-powered speech recognition APIs that developers can integrate into their apps to transcribe and understand audio content. Its APIs process audio data to produce transcripts and extract insights, offering fast, accurate, scalable, and cost-effective transcription for users ranging from startups to large enterprises (including NASA) with large daily audio volumes. The product works by sending audio to Deepgram’s cloud API, where the service returns text and other understanding signals; customers pay based on the amount of audio processed (pay-per-use), allowing revenue to grow with usage. Compared with competitors, Deepgram emphasizes reliable performance at scale, enterprise-friendly support, and a simple API-based model rather than on-premises or heavy client-side processing. The company’s goal is to enable organizations to turn large amounts of audio into usable text and insights easily and affordably by providing a scalable API platform for speech recognition.
Company Size
201-500
Company Stage
Series C
Total Funding
$233.3M
Headquarters
San Francisco, California
Founded
2015
Help us improve and share your feedback! Did you find this helpful?
Flexible Work Hours
Mistral now offers new open-source model for speech generation. On Thursday, the French AI startup Mistral unveiled a new open-source text-to-speech model that may be utilised in enterprise use cases such as customer care or by voice AI assistants. Mistral is in direct rivalry with companies like ElevenLabs, Deepgram, and OpenAI thanks to the platform, which enables businesses to create speech assistants for sales and customer engagement. Voxtral TTS is the first open-source text-to-speech (TTS) model from Mistral AI. The model was introduced as lightweight enough to operate locally on edge devices like laptops, smartphones, and smartwatches. Nine languages are supported by the new model, known as Voxtral TTS: English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic. "A speech model has been requested by our clients. Therefore, we developed a compact speech model that can be used on laptops, smartphones, smartwatches, and other edge devices. In a phone conversation with a press firm, Pierre Stock, vice president of science operations at Mistral AI, stated, "It offers state-of-the-art performance at a fraction of the cost of anything else on the market." According to Mistral, the new model can catch features such as minor accents, inflections, intonations, and anomalies in speech flow using a sample of less than five seconds. For use cases like dubbing or real-time translation, the model, which is based on Ministral 3B, can effortlessly convert between languages without losing the voice's qualities. According to Stock, the company intended for the model to sound human rather than robotic. The company claims that the model was designed for real-time performance. For a 10-second sample of 500 characters, its time-to-first-audio (TTFA), which measures when the model begins "speaking" after receiving input, is 90 ms. Additionally, the model can render a 10-second clip in about 1.6 seconds thanks to its real-time factor (RTF) of 6x. These are what to expect with the introduction of the Voxtral TTS in terms of the key features. * Edge-friendly: 4 billion parameters, runs on just 3 GB RAM. * Low latency: 90 ms time-to-first-audio for real-time use. * Voice cloning: Adapts to any voice with under 5 seconds of audio. * Multilingual: Supports nine languages, including English, Spanish, Hindi, and Arabic. * Expressive: Delivers human-like speech with emotion and varied tone. Mistral introduced two transcription models earlier this year, one for big batch processing and the other for low-latency real-time use cases. The company's goal with the new speech model is probably to provide businesses a complete range of voice solutions. "Techbooky intend to create an end-to-end platform that can manage multimodal input streams, such as text, audio, and images, as well as output. The primary advantage of that is that an end-to-end agentic system that allows audio as an input or output gives you a lot more information, according to Stock. With regard to Mistral's positioning, because its speech models are open source and customizable, businesses will be more likely to use them than their rivals. Voxtral TTS completes a full suite of voice AI products, following Mistral's recent release of speech-to-text (transcription) models, including Voxtral Realtime and Voxtral Mini Transcribe V2. The model can be deployed privately and on-device without relying on the cloud because it is made available with open weights under an Apache 2.0 license.
Penguin Solutions selected by Deepgram to enable deployment of optimized AI inference infrastructure for Enterprise Voice AI. 3 minutes read Strategic collaboration leverages Dell PowerEdge servers and NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs to deliver high-performance, low-latency voice experiences for mission-critical applications in healthcare and retail FREMONT, Calif.-(BUSINESS WIRE)-$PENG #AI - Penguin Solutions, Inc. (Nasdaq: PENG), the AI factory platform company, today announced a strategic collaboration with Deepgram and Dell Technologies to architect and deploy a fully optimized, production-ready infrastructure aligned to Deepgram's demanding enterprise voice AI requirements. By leveraging its unique expertise in designing, building, deploying, and managing AI infrastructure with Dell PowerEdge servers and Dell PowerScale storage optimized for AI workloads, Penguin Solutions delivered an optimal solution to support and enhance Deepgram's innovative Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Agent capabilities, while ensuring maximum reliability and performance. As enterprise adoption of generative AI accelerates, organizations must adhere to stricter service level agreements (SLAs), which require infrastructure that can ensure low latency and high concurrent usage. This Penguin-led deployment addresses these challenges by combining Deepgram's innovative voice AI models with a purpose-built architectural design, a highly efficient deployment, and ongoing performance optimization. "Modern AI workloads demand infrastructure that performs consistently and scales predictably under heavy loads, particularly for real-time inference applications like voice agents," said Joe Castillo, vice president of sales at Penguin Solutions. "By partnering with Deepgram and utilizing proven Dell AI infrastructure, Penguin Solutions is delivering a validated, scalable, end-to-end architecture. Our comprehensive framework equips Deepgram with the optimized infrastructure needed to reliably and accurately deliver complex voice AI capabilities in healthcare, retail, and other industries." Drawing on its extensive experience with HPC and AI infrastructure, Penguin Solutions ensures that the underlying infrastructure meets the specific demands of Deepgram's neural networks. The architecture also incorporates Dell PowerScale storage and Dell PowerEdge XE7745 servers with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, which provide efficient inferencing that enables data-intensive voice applications to operate seamlessly in real-time environments. "Deepgram is focused on delivering voice AI capabilities that meet the demanding performance, scalability, and reliability requirements of enterprise environments - something only Deepgram brings to the market today," said Abe Pursell, vice president of partnerships and business development at Deepgram. "The infrastructure behind our platform has to be equally robust to support that level of innovation. Penguin Solutions demonstrated a deep understanding of our technical requirements, translating them into a sophisticated infrastructure environment that meets and exceeds expectations. This enables us to continue delivering the enterprise-class capabilities our customers rely on." "AI-driven voice applications are transforming how organizations engage with customers and patients, but success depends on a resilient, high-performance infrastructure foundation," said David Noy, vice president, unstructured data solutions product management at Dell Technologies. "Our collaboration with Penguin Solutions demonstrates how AI-optimized Dell PowerScale storage and Dell PowerEdge servers with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs can accelerate enterprise AI adoption at scale. Together, we're enabling Deepgram to deliver secure, low-latency voice AI experiences that power mission-critical innovation across healthcare and retail." The Deepgram-Penguin Solutions-Dell collaboration comprises a comprehensive approach for enterprises looking to modernize their customer and employee experiences. With Deepgram's API-driven voice capabilities, Penguin Solutions' AI services, and Dell's powerful AI infrastructure, organizations can achieve highly accurate, real-time transcription and speech synthesis - all while maintaining strict data governance and control. For those attending NVIDIA GTC AI Conference and Expo March 16-19, 2026, in San Jose, CA, learn more about this innovative collaboration at Dell's Booth #721 on March 17 at 3:30 p.m. for the session "Powering Enterprise Voice AI: Deepgram's Agentic Solution" presented by Penguin, Deepgram and Dell. Attendees can also stop by Penguin Solutions' booth #1031 to speak with an AI factory platform expert. Penguin Solutions is a trademark or registered trademark of Penguin Solutions, Inc. or its affiliates. All other trademarks are the property of their respective owners. About Penguin Solutions The most transformative technological advancements are often the hardest to deploy and optimize. Penguin Solutions, the AI factory platform company, has the innovative technologies, skills, experience, and partnerships needed to turn your AI ambitions into reality. In addition to its AI capabilities, Penguin Solutions offers memory and LED solutions serving a wide range of high-performance and specialized applications. PR Contact Maureen O'Leary Penguin Solutions Corporate Communications 1-602-330-6846 [email protected] 46 seconds ago 57 seconds ago 1 minute ago
Ready to use Deepgram in WordPress? If you've been searching for a reliable way to integrate Deepgram with WordPress, Easy Text-to-Speech Pro 2.4 makes it effortless. Update today and unlock a new level of AI voice capabilities for your WordPress site.
IBM partners with Deepgram to enhance enterprise AI capabilities with advanced voice recognition and real-time captioning. Tuesday, Feb 24, 2026 8:33 am ET 1min read IBM and Deepgram collaborate to integrate Deepgram's speech-to-text and text-to-speech capabilities into IBM's WatsonX Orchestrate generative AI solution. This integration addresses client needs for highly performant transcription and real-time captioning, offering a wider range of languages and dialects, including custom tuning and natural-sounding speech. The collaboration enhances automated customer care and support, call analysis, and voice-driven data entry in fields like healthcare and finance. Ask Aime: How will the integration of speech-to-text and text-to-speech capabilities impact stock analysis? Aime insights. Which stocks has Pelosi recently bought? Among the Magnificent 7, which stock offers the best value to buy now? How is the S&P 500 performing today, and could you explain the reasons behind it? Could you find stocks channeling up?
IBM and Deepgram have announced a collaboration to integrate Deepgram's speech-to-text and text-to-speech capabilities into IBM's watsonx Orchestrate generative AI solution. Deepgram becomes IBM's first voice partner for the platform. The integration aims to provide enterprise-grade transcription and real-time captioning, addressing challenges such as background noise, diverse accents and real-world dialogue. The technology supports dozens of language variants, including Arabic and Indian dialects, with options for custom tuning and natural-sounding speech. The collaboration targets applications in automated customer care, call analysis and voice-driven data entry across sectors including healthcare and finance. Deepgram has processed over 50,000 years of audio and transcribed over one trillion words. The partnership expands Deepgram's enterprise reach whilst strengthening IBM's AI ecosystem.