Full-Time

Senior Software Engineer

Backend

Confirmed live in the last 24 hours

Deepgram

Deepgram

51-200 employees

Speech recognition APIs for audio transcription

No salary listed

Senior

Remote in USA

Category
Backend Engineering
Software Engineering
Required Skills
Rust
Python
Git
C/C++
Linux/Unix
Requirements
  • At least several years of experience in an industry role
  • Programming experience in Rust (or C, C++), with competence in Python
  • Excellent communication and organizational skills, both written and verbal
  • A high level of experience and understanding of version control; preferably git
  • Comprehensive experience with UNIX-style systems
Responsibilities
  • Improve Deepgram’s core inference services including areas in networking, speech processing, audio transcoding, and latency and memory optimization
  • Develop processes for measuring, building, and optimizing services to maximize system performance
  • Debug complex system issues that include networking, scheduling and high performance computing interactions
  • Rapidly customize backend services to support our customer needs
  • Partner with Product to design and implement new services, features, and/or products end to end
Desired Qualifications
  • Experience with modern machine learning, such as experience with a framework like Torch or implementation knowledge of architectures like CNNs, RNNS, and transformers
  • Experience with audio processing

Deepgram specializes in artificial intelligence for speech recognition, offering a set of APIs that developers can use to transcribe and understand audio content. Their technology is designed to process large volumes of audio quickly and accurately, making it suitable for a variety of clients, including startups and large organizations like NASA. Unlike many competitors, Deepgram's pay-per-use model allows clients to only pay for the audio they transcribe, which can be more cost-effective for businesses. The company's goal is to meet the growing demand for speech recognition technology by providing scalable and efficient solutions for audio processing.

Company Size

51-200

Company Stage

Series B

Total Funding

$100.5M

Headquarters

San Francisco, California

Founded

2015

Simplify Jobs

Simplify's Take

What believers are saying

  • Growing demand for advanced language support boosts Deepgram's multi-language transcription potential.
  • Real-time processing focus aligns with Deepgram's low-latency solutions for audio transcription.
  • Partnership with EMAM enhances media workflows with AI, expanding Deepgram's market reach.

What critics are saying

  • ElevenLabs' Scribe model outperforms Deepgram's Nova-3 in transcription accuracy.
  • Gladia's emphasis on real-time processing challenges Deepgram's competitive edge.
  • xAI's Grok-3 model may overshadow Deepgram's AI capabilities with superior infrastructure.

What makes Deepgram unique

  • Deepgram offers APIs for speech-to-text and language understanding, catering to diverse industries.
  • Deepgram's pay-per-use model aligns revenue with client usage, enhancing scalability.
  • Deepgram's acquisition of Poised enhances its voice AI capabilities for virtual meetings.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Comprehensive Health Plans

FSA Health Matching up to $1,000

Work from Home Ergonomic Stipend

Healthy Food & Snacks in offices

Community Groups

Unlimited Vacation

Growth & Insights and Company News

Headcount

6 month growth

-2%

1 year growth

-1%

2 year growth

0%
VentureBeat
Feb 26th, 2025
Elevenlabs’ New Speech-To-Text Model Scribe Is Here With Highest Accuracy Rate So Far (96.7% For English)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. ElevenLabs, the highly-valued AI voice cloning and generation startup from former Palantir alumni, today launched Scribe v1, a new speech-to-text model that reportedly achieves the highest accuracy across multiple languages. Users can try it here on the ElevenLabs site.According to the company’s benchmarks, it outperforms Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3, and Deepgram Nova-3 on accurately converting spoken speech into text on the web, achieving new record-low error rates.The company claims that Scribe delivers state-of-the-art transcription accuracy in 99 languages, including improved performance in previously underserved languages such as Serbian, Cantonese, and Malayalam. As Flavio Schneider, ElevenLabs Lead Researcher wrote on X, Scribe is the “smartest audio understanding model” released by ElevenLabs yet. “Scribe doesn’t just transcribe — it understands audio,” Schneider continued in a threaded reply

VentureBeat
Feb 26th, 2025
Elevenlabs' New Speech-To-Text Model Scribe Is Here With Highest Accuracy Rate So Far (96.7% For English)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. ElevenLabs, the highly-valued AI voice cloning and generation startup from former Palantir alumni, today launched Scribe v1, a new speech-to-text model that reportedly achieves the highest accuracy across multiple languages. Users can try it here.According to the company’s benchmarks, it outperforms Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3 and Deepgram Nova-3 in accurately converting spoken speech into text on the web, achieving new record-low error rates.The company claims that Scribe delivers state-of-the-art transcription accuracy in 99 languages, including improved performance in previously underserved languages such as Serbian, Cantonese and Malayalam.As Flavio Schneider, ElevenLabs lead researcher wrote on X, Scribe is the “smartest audio understanding model” released by ElevenLabs yet.“Scribe doesn’t just transcribe — it understands audio,” Schneider continued in a thread. “It can detect non-verbal events (like laughter, sound effects, music and background noise) and analyze long audio contexts for accurate diarization, even in the most challenging environments.”“Diarization” is the name given to the process of separating speakers by their vocal qualities on a recording.In fact, ElevenLabs’ documentation states Scribe can distinguish and isolate up to 32 different speakers in the same audio file. While ElevenLabs cautions that Scribe is “best used when high-accuracy transcription is required rather than real-time transcription,” the company also plans to introduce a low-latency version soon, expanding its use for real-time applications.Lowest word error rates (WER)Scribe is designed to handle real-world audio challenges with precision

Decrypt
Feb 18th, 2025
Musk’S Xai Unveils Grok-3: More Power, But Is It Breaking New Ground?

Decrypt’s Art, Fashion, and Entertainment Hub. Discover SCENEGrok-3, developed by Elon Musk’s xAI, was unveiled on Monday, with the company making bold claims about its capabilities while showcasing a massive computing infrastructure that signals even bigger ambitions.The announcement focused heavily on raw computational muscle, benchmark performance, and upcoming features, though many of the actual demonstrations felt like replays of what other AI companies have already achieved.The star of the initial part of the show wasn't the AI itself, but rather "Colossus," a behemoth cluster of 200,000 GPUs that powers Grok-3's training.The system came together in two phases: 122 days of synchronous training on 100,000 GPUs, followed by 92 days of scaling up to the full 200,000. According to the xAI developers, building this infrastructure proved more challenging than developing the AI model itself.The company already has plans for an even more powerful cluster, with Musk saying they are aiming for five times the current capacity, effectively building what would be the most powerful GPU cluster on earth.When it comes to performance, Grok-3 shows impressive results across standard AI benchmarks. The base model (the regular model without Chain of Thought and reasoning embedded) consistently tops the charts in math (AIME), science (GPOA), and coding (LCB) tests.It also seems very promising in blind tests.xAI confirmed that the mysterious model codenamed “Chocolate” was actually an early test version of Grok-3 that was uploaded to the LLM Arena.During those tests, it achieved the best ELO among all the LLMs, meaning users preferred its answers over the generations provided by all the other AI models in direct competition without knowing which model they were evaluating.This is probably the most accurate way to measure quality without giving models any chance to cheat on benchmarks by training their AIs on those datasets. This benchmark is based purely on preference and blind choice by thousands of anonymous users.xAI team shows off Grok 3's benchmark tests during a live presentation. Image: xAIA specialized "Reasoning Beta" variant of Grok-3, which employs internal chain-of-thought processing and additional computing at test time, pushes math scores even higher—reaching 93% on the AIME 2025 benchmark compared to the other best-performing models that rank below 87%.Interestingly, a smaller version called Grok-3 Mini Reasoning Beta sometimes outperforms its larger sibling, thanks to a longer training time.In other words, the full-size Grok-3 still has room for improvement once it receives comparable training duration, which seems promising given its greater parameter count.But when xAI moved to demonstrate Grok-3's capabilities live, the presentation felt more like a game of catch-up than innovation

CX Today
Feb 13th, 2025
Deepgram's Shortcut Beckons the Future of Personalized AI Assistants

In December, Deepgram introduced Shortcut, an on-device AI assistant that takes the user's spoken words to perform various tasks.

Pachronicle
Feb 12th, 2025
Deepgram launches improved AI-based voice transcription for enterprises

With the launch today of Nova-3, its most advanced speech-to-text (STT) model to date, Deepgram is looking to offer greater accuracy along with self-service customization to tailor results for industry-specific needs.