Full-Time
Confirmed live in the last 24 hours
Speech recognition APIs for audio transcription
No salary listed
Senior
Remote in USA
Deepgram specializes in artificial intelligence for speech recognition, offering a set of APIs that developers can use to transcribe and understand audio content. Their technology allows clients, ranging from startups to large organizations like NASA, to process millions of audio minutes daily. Deepgram's APIs are designed to be fast, accurate, scalable, and cost-effective, making them suitable for businesses needing to handle large volumes of audio data. The company operates on a pay-per-use model, where clients are charged based on the amount of audio they transcribe, allowing Deepgram to grow its revenue alongside client usage. With a focus on the high-growth market of speech recognition, Deepgram is positioned for future success.
Company Size
51-200
Company Stage
Series B
Total Funding
$100.5M
Headquarters
San Francisco, California
Founded
2015
Help us improve and share your feedback! Did you find this helpful?
Flexible Work Hours
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. ElevenLabs, the highly-valued AI voice cloning and generation startup from former Palantir alumni, today launched Scribe v1, a new speech-to-text model that reportedly achieves the highest accuracy across multiple languages. Users can try it here.According to the company’s benchmarks, it outperforms Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3 and Deepgram Nova-3 in accurately converting spoken speech into text on the web, achieving new record-low error rates.The company claims that Scribe delivers state-of-the-art transcription accuracy in 99 languages, including improved performance in previously underserved languages such as Serbian, Cantonese and Malayalam.As Flavio Schneider, ElevenLabs lead researcher wrote on X, Scribe is the “smartest audio understanding model” released by ElevenLabs yet.“Scribe doesn’t just transcribe — it understands audio,” Schneider continued in a thread. “It can detect non-verbal events (like laughter, sound effects, music and background noise) and analyze long audio contexts for accurate diarization, even in the most challenging environments.”“Diarization” is the name given to the process of separating speakers by their vocal qualities on a recording.In fact, ElevenLabs’ documentation states Scribe can distinguish and isolate up to 32 different speakers in the same audio file. While ElevenLabs cautions that Scribe is “best used when high-accuracy transcription is required rather than real-time transcription,” the company also plans to introduce a low-latency version soon, expanding its use for real-time applications.Lowest word error rates (WER)Scribe is designed to handle real-world audio challenges with precision
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. ElevenLabs, the highly-valued AI voice cloning and generation startup from former Palantir alumni, today launched Scribe v1, a new speech-to-text model that reportedly achieves the highest accuracy across multiple languages. Users can try it here on the ElevenLabs site.According to the company’s benchmarks, it outperforms Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3, and Deepgram Nova-3 on accurately converting spoken speech into text on the web, achieving new record-low error rates.The company claims that Scribe delivers state-of-the-art transcription accuracy in 99 languages, including improved performance in previously underserved languages such as Serbian, Cantonese, and Malayalam. As Flavio Schneider, ElevenLabs Lead Researcher wrote on X, Scribe is the “smartest audio understanding model” released by ElevenLabs yet. “Scribe doesn’t just transcribe — it understands audio,” Schneider continued in a threaded reply
Decrypt’s Art, Fashion, and Entertainment Hub. Discover SCENEGrok-3, developed by Elon Musk’s xAI, was unveiled on Monday, with the company making bold claims about its capabilities while showcasing a massive computing infrastructure that signals even bigger ambitions.The announcement focused heavily on raw computational muscle, benchmark performance, and upcoming features, though many of the actual demonstrations felt like replays of what other AI companies have already achieved.The star of the initial part of the show wasn't the AI itself, but rather "Colossus," a behemoth cluster of 200,000 GPUs that powers Grok-3's training.The system came together in two phases: 122 days of synchronous training on 100,000 GPUs, followed by 92 days of scaling up to the full 200,000. According to the xAI developers, building this infrastructure proved more challenging than developing the AI model itself.The company already has plans for an even more powerful cluster, with Musk saying they are aiming for five times the current capacity, effectively building what would be the most powerful GPU cluster on earth.When it comes to performance, Grok-3 shows impressive results across standard AI benchmarks. The base model (the regular model without Chain of Thought and reasoning embedded) consistently tops the charts in math (AIME), science (GPOA), and coding (LCB) tests.It also seems very promising in blind tests.xAI confirmed that the mysterious model codenamed “Chocolate” was actually an early test version of Grok-3 that was uploaded to the LLM Arena.During those tests, it achieved the best ELO among all the LLMs, meaning users preferred its answers over the generations provided by all the other AI models in direct competition without knowing which model they were evaluating.This is probably the most accurate way to measure quality without giving models any chance to cheat on benchmarks by training their AIs on those datasets. This benchmark is based purely on preference and blind choice by thousands of anonymous users.xAI team shows off Grok 3's benchmark tests during a live presentation. Image: xAIA specialized "Reasoning Beta" variant of Grok-3, which employs internal chain-of-thought processing and additional computing at test time, pushes math scores even higher—reaching 93% on the AIME 2025 benchmark compared to the other best-performing models that rank below 87%.Interestingly, a smaller version called Grok-3 Mini Reasoning Beta sometimes outperforms its larger sibling, thanks to a longer training time.In other words, the full-size Grok-3 still has room for improvement once it receives comparable training duration, which seems promising given its greater parameter count.But when xAI moved to demonstrate Grok-3's capabilities live, the presentation felt more like a game of catch-up than innovation
In December, Deepgram introduced Shortcut, an on-device AI assistant that takes the user's spoken words to perform various tasks.
With the launch today of Nova-3, its most advanced speech-to-text (STT) model to date, Deepgram is looking to offer greater accuracy along with self-service customization to tailor results for industry-specific needs.