Full-Time

Infrastructure Engineer

Confirmed live in the last 24 hours

Deepgram

Deepgram

51-200 employees

Speech recognition APIs for audio transcription

No salary listed

Senior

Remote in USA

Category
DevOps & Infrastructure
Network Engineering
Platform Engineering
Required Skills
Kubernetes
Computer Networking
Requirements
  • 5+ years of experience in infrastructure engineering or similar roles
  • Strong background in network engineering and design for reliability
  • Experience with large-scale storage systems (distributed file systems, caching solutions)
  • Proven track record of managing data center infrastructure
  • Expertise in container orchestration platforms (Kubernetes, Slurm)
  • Experience with GPU infrastructure management and optimization
  • Strong automation and scripting skills
Responsibilities
  • Design and implement reliable, high-performance network architectures for distributed systems
  • Architect and maintain large-scale storage solutions, including backup systems, distributed caching, and object storage
  • Build and optimize cost-effective data center infrastructure
  • Develop and maintain GPU compute clusters for AI inference workloads
  • Manage large-scale deployments using modern orchestration platforms like Kubernetes and Slurm
  • Implement monitoring, alerting, and automation solutions for infrastructure management
Desired Qualifications
  • Experience with software-defined networking
  • Knowledge of cost optimization for cloud and on-premise infrastructure
  • Familiarity with AI/ML workloads and their infrastructure requirements
  • Experience with multi-region infrastructure deployment
  • Background in performance optimization for distributed systems
  • Certification in relevant cloud platforms (AWS, GCP, Azure)

Deepgram specializes in artificial intelligence for speech recognition, offering a set of APIs that developers can use to transcribe and understand audio content. Their technology allows clients, ranging from startups to large organizations like NASA, to process millions of audio minutes daily. Deepgram's APIs are designed to be fast, accurate, scalable, and cost-effective, making them suitable for businesses needing to handle large volumes of audio data. The company operates on a pay-per-use model, where clients are charged based on the amount of audio they transcribe, allowing Deepgram to grow its revenue alongside client usage. With a focus on the high-growth market of speech recognition, Deepgram is positioned for future success.

Company Size

51-200

Company Stage

Series B

Total Funding

$100.5M

Headquarters

San Francisco, California

Founded

2015

Simplify Jobs

Simplify's Take

What believers are saying

  • Growing demand for advanced language support boosts Deepgram's multi-language transcription potential.
  • Real-time processing focus aligns with Deepgram's low-latency solutions for audio transcription.
  • Partnership with EMAM enhances media workflows with AI, expanding Deepgram's market reach.

What critics are saying

  • ElevenLabs' Scribe model outperforms Deepgram's Nova-3 in transcription accuracy.
  • Gladia's emphasis on real-time processing challenges Deepgram's competitive edge.
  • xAI's Grok-3 model may overshadow Deepgram's AI capabilities with superior infrastructure.

What makes Deepgram unique

  • Deepgram offers APIs for speech-to-text and language understanding, catering to diverse industries.
  • Deepgram's pay-per-use model aligns revenue with client usage, enhancing scalability.
  • Deepgram's acquisition of Poised enhances its voice AI capabilities for virtual meetings.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Flexible Work Hours

Growth & Insights and Company News

Headcount

6 month growth

-2%

1 year growth

0%

2 year growth

0%
VentureBeat
Feb 26th, 2025
Elevenlabs' New Speech-To-Text Model Scribe Is Here With Highest Accuracy Rate So Far (96.7% For English)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. ElevenLabs, the highly-valued AI voice cloning and generation startup from former Palantir alumni, today launched Scribe v1, a new speech-to-text model that reportedly achieves the highest accuracy across multiple languages. Users can try it here.According to the company’s benchmarks, it outperforms Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3 and Deepgram Nova-3 in accurately converting spoken speech into text on the web, achieving new record-low error rates.The company claims that Scribe delivers state-of-the-art transcription accuracy in 99 languages, including improved performance in previously underserved languages such as Serbian, Cantonese and Malayalam.As Flavio Schneider, ElevenLabs lead researcher wrote on X, Scribe is the “smartest audio understanding model” released by ElevenLabs yet.“Scribe doesn’t just transcribe — it understands audio,” Schneider continued in a thread. “It can detect non-verbal events (like laughter, sound effects, music and background noise) and analyze long audio contexts for accurate diarization, even in the most challenging environments.”“Diarization” is the name given to the process of separating speakers by their vocal qualities on a recording.In fact, ElevenLabs’ documentation states Scribe can distinguish and isolate up to 32 different speakers in the same audio file. While ElevenLabs cautions that Scribe is “best used when high-accuracy transcription is required rather than real-time transcription,” the company also plans to introduce a low-latency version soon, expanding its use for real-time applications.Lowest word error rates (WER)Scribe is designed to handle real-world audio challenges with precision

VentureBeat
Feb 26th, 2025
Elevenlabs’ New Speech-To-Text Model Scribe Is Here With Highest Accuracy Rate So Far (96.7% For English)

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. ElevenLabs, the highly-valued AI voice cloning and generation startup from former Palantir alumni, today launched Scribe v1, a new speech-to-text model that reportedly achieves the highest accuracy across multiple languages. Users can try it here on the ElevenLabs site.According to the company’s benchmarks, it outperforms Google’s Gemini 2.0 Flash, OpenAI’s Whisper v3, and Deepgram Nova-3 on accurately converting spoken speech into text on the web, achieving new record-low error rates.The company claims that Scribe delivers state-of-the-art transcription accuracy in 99 languages, including improved performance in previously underserved languages such as Serbian, Cantonese, and Malayalam. As Flavio Schneider, ElevenLabs Lead Researcher wrote on X, Scribe is the “smartest audio understanding model” released by ElevenLabs yet. “Scribe doesn’t just transcribe — it understands audio,” Schneider continued in a threaded reply

Decrypt
Feb 18th, 2025
Musk’S Xai Unveils Grok-3: More Power, But Is It Breaking New Ground?

Decrypt’s Art, Fashion, and Entertainment Hub. Discover SCENEGrok-3, developed by Elon Musk’s xAI, was unveiled on Monday, with the company making bold claims about its capabilities while showcasing a massive computing infrastructure that signals even bigger ambitions.The announcement focused heavily on raw computational muscle, benchmark performance, and upcoming features, though many of the actual demonstrations felt like replays of what other AI companies have already achieved.The star of the initial part of the show wasn't the AI itself, but rather "Colossus," a behemoth cluster of 200,000 GPUs that powers Grok-3's training.The system came together in two phases: 122 days of synchronous training on 100,000 GPUs, followed by 92 days of scaling up to the full 200,000. According to the xAI developers, building this infrastructure proved more challenging than developing the AI model itself.The company already has plans for an even more powerful cluster, with Musk saying they are aiming for five times the current capacity, effectively building what would be the most powerful GPU cluster on earth.When it comes to performance, Grok-3 shows impressive results across standard AI benchmarks. The base model (the regular model without Chain of Thought and reasoning embedded) consistently tops the charts in math (AIME), science (GPOA), and coding (LCB) tests.It also seems very promising in blind tests.xAI confirmed that the mysterious model codenamed “Chocolate” was actually an early test version of Grok-3 that was uploaded to the LLM Arena.During those tests, it achieved the best ELO among all the LLMs, meaning users preferred its answers over the generations provided by all the other AI models in direct competition without knowing which model they were evaluating.This is probably the most accurate way to measure quality without giving models any chance to cheat on benchmarks by training their AIs on those datasets. This benchmark is based purely on preference and blind choice by thousands of anonymous users.xAI team shows off Grok 3's benchmark tests during a live presentation. Image: xAIA specialized "Reasoning Beta" variant of Grok-3, which employs internal chain-of-thought processing and additional computing at test time, pushes math scores even higher—reaching 93% on the AIME 2025 benchmark compared to the other best-performing models that rank below 87%.Interestingly, a smaller version called Grok-3 Mini Reasoning Beta sometimes outperforms its larger sibling, thanks to a longer training time.In other words, the full-size Grok-3 still has room for improvement once it receives comparable training duration, which seems promising given its greater parameter count.But when xAI moved to demonstrate Grok-3's capabilities live, the presentation felt more like a game of catch-up than innovation

CX Today
Feb 13th, 2025
Deepgram's Shortcut Beckons the Future of Personalized AI Assistants

In December, Deepgram introduced Shortcut, an on-device AI assistant that takes the user's spoken words to perform various tasks.

Pachronicle
Feb 12th, 2025
Deepgram launches improved AI-based voice transcription for enterprises

With the launch today of Nova-3, its most advanced speech-to-text (STT) model to date, Deepgram is looking to offer greater accuracy along with self-service customization to tailor results for industry-specific needs.