Full-Time

Senior Software Engineer

Agents, Python

LiveKit

LiveKit

51-200 employees

Open-source WebRTC platform with managed cloud

Compensation Overview

$120k - $250k/yr

+ Equity

Company Does Not Provide H1B Sponsorship

Remote in USA

Remote

Category
Software Engineering (1)
Required Skills
Rust
Python
REST APIs
Requirements
  • Strong experience building production systems in Python
  • Excellent API design instincts and strong engineering taste
  • Able to independently drive projects from idea to completion
  • Solid experience building SDKs, frameworks, or developer tools
  • Comfortable navigating complex systems and making design tradeoffs
  • Able to clearly communicate technical ideas and architectural decisions
Responsibilities
  • Design and implement core components of the Agents Python framework
  • Define APIs and abstractions that developers rely on
  • Own features from design through implementation
  • Identify simple and elegant solutions to complex design problems
  • Propose and debate architectural decisions that shape the framework
  • Ensure new features integrate cleanly with the broader ecosystem
  • Collaborate with engineers across Python, Node, and Rust systems
Desired Qualifications
  • Open-source contributions or libraries you maintain
  • Experience building LLM-based or agentic applications
  • Familiarity with TypeScript / JavaScript ecosystems

LiveKit provides an open-source platform to build real-time audio and video apps using an end-to-end WebRTC stack. It also offers LiveKit Cloud, a fully-managed global hosting service that takes care of real-time media infrastructure so developers can focus on their applications. The model includes both a self-hosted open-source option and a paid cloud service, serving individuals to large enterprises. Its goal is to help developers add scalable real-time communication features to their products without managing the underlying media infrastructure.

Company Size

51-200

Company Stage

Series C

Total Funding

$181.2M

Headquarters

San Jose, California

Founded

2021

Simplify Jobs

Simplify's Take

What believers are saying

  • Series C raised $100M at $1B valuation on January 22, 2026.
  • Telnyx partnership cuts AI voice costs 50% with sub-200ms latency.
  • Agent Builder enables one-click deployment from browser prototypes.

What critics are saying

  • Telnyx undercuts LiveKit Cloud pricing by 50% and waives session fees.
  • OpenAI builds in-house stack, prompting xAI and Salesforce exodus in 18 months.
  • Deepgram bundles STT with WebRTC, capturing enterprise deployments in 12 months.

What makes LiveKit unique

  • LiveKit provides end-to-end WebRTC stack powering ChatGPT Advanced Voice Mode.
  • livekit-wakeword achieves 100x fewer false positives than openWakeWord.
  • Adaptive Interruption Handling delivers 86% precision across multilingual conversations.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Unlimited Paid Time Off

Remote Work Options

Company Equity

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

1%

2 year growth

1%
The Associated Press
Apr 6th, 2026
Telnyx launches LiveKit platform with 50% lower AI voice costs and sub-200ms latency

Telnyx has launched LiveKit on Telnyx, a fully hosted platform for deploying voice AI agents with reduced costs and ultra-low latency. The platform allows developers to run existing LiveKit agents on Telnyx-owned infrastructure without code changes. By owning the entire infrastructure stack—carrier network, GPU clusters and telephony—Telnyx offers 50% lower speech-to-text and text-to-speech costs compared to LiveKit Cloud. The company is waiving session fees during the beta period, eliminating the current $0.01 per minute charge LiveKit customers typically pay. The platform achieves sub-200ms round-trip time by hosting speech models on colocated GPU infrastructure across 18 global points of presence. It includes enterprise telephony features and compliance standards including HIPAA, PCI and SOC 2. LiveKit on Telnyx is now available in beta.

LiveKit
Apr 6th, 2026
Open-source wake word training in a single command.

Open-source wake word training in a single command. Wake words are the short spoken phrases, like "Hey Siri" or "Alexa", that activate a voice-enabled device or agent. They're the first step in any hands-free voice interaction, and getting them right matters: too sensitive and they fire constantly, too strict and users have to repeat themselves. Today LiveKit Incorporated is launching livekit-wakeword, an open-source wake word library built for simplicity and speed. Why LiveKit Incorporated built this. If you've tried training wake word models before, you know the pain: * Existing codebases are outdated, with broken dependencies everywhere. * Documentation is sparse or nonexistent, so training new models requires hours or even days of reverse-engineering. And even if you manage to train a model, you still end up with one that false-triggers constantly because you used the vanilla settings the authors provided. LiveKit Incorporated built livekit-wakeword to fix all of this. Now you can train your own wake word model from scratch, locally, with a single command. Use cases. Custom wake words unlock hands-free voice activation across a wide range of applications: * Voice agents: Give your AI agent a branded activation phrase ("Hey Jarvis," "OK Chef") instead of relying on a generic keyword. * Smart home assistant: Train a custom phrase for your home setup without depending on cloud services. * Robotics: Activate a robot with a spoken command in noisy warehouse or factory environments. * Kiosks & accessibility devices: Enable hands-free activation for retail, healthcare, or public-facing hardware. * In-car & embedded systems: Trigger voice control in vehicles or IoT devices running on constrained hardware. Performance. Even though its library is simple and fast, LiveKit Incorporated didn't sacrifice accuracy. Compared to openWakeWord, livekit-wakeword achieved dramatically better results across every metric: * 100x fewer false positives per hour * 60x lower detection error * 86% vs 69% recall | Metric | livekit-wakeword | openWakeWord | | False positives per hour (FPPH) | 0.08 | 8.50 | | Detection error tradeoff (AUT) | 0.0012 | 0.0720 | | Recall | 86% | 69% | FPPH measures how often the model incorrectly fires when no wake word was spoken - lower is better. AUT (area under the DET curve) captures the overall tradeoff between false positives and missed detections. See the full comparison for DET curves, test conditions, and detailed methodology. How it works. Under the hood, livekit-wakeword generates thousands of synthetic training samples using text-to-speech, then applies realistic audio augmentations (background noise, reverb, gain variation) to simulate real-world conditions. A lightweight convolutional-attention classifier trains on top of pre-computed audio embeddings, producing a small, fast model that generalizes well beyond its training data. Since its exported models use the same ONNX format and inference pipeline as openWakeWord, they're fully compatible. Your Home Assistant or legacy projects still work with zero changes. Part of the LiveKit ecosystem. livekit-wakeword is designed to work seamlessly with the LiveKit platform. Use a wake word to trigger a LiveKit Agent session. The wake word model runs locally on-device with minimal latency, and once activated, LiveKit handles the realtime audio streaming to your agent. Start building with livekit-wakeword. To train a new wake word model, install the library and run setup: 1 # install livekit-wakeword with training, evaluation, and export extras 2 pip install livekit-wakeword[train,eval,export] 3 4 # download required embedding models and datasets 5 livekit-wakeword setup Then create a config file for your wake word: 1 model_name: hey_robot 2 target_phrases: 3 - "hey robot" 4 5 n_samples: 10000 # synthetic training samples per class 6 model: 7 model_type: conv_attention # its new conv-attention classifier 8 model_size: small 9 steps: 50000 # training steps Check out the README for the full list of config options. Once your config is ready, you can train your model with a single command: 1 # generates synthetic data, augments, trains, and exports to ONNX 2 # your model will be saved to ./output/hey_robot/hey_robot.onnx 3 livekit-wakeword run configs/hey_robot.yaml That single command handles everything: synthetic data generation, augmentation, training, and ONNX export. You'll get a production-ready model file you can use right away. The exported model is a standard ONNX file, fully backward compatible with openWakeWord, so it drops into Home Assistant or any existing openWakeWord integration with zero changes. To run detection, just load the model and feed it audio: 1 from livekit.wakeword import WakeWordModel 2 3 # load your exported ONNX model 4 model = WakeWordModel(models=["hey_robot.onnx"]) 5 6 # feed 16kHz audio frames (int16 or float32) 7 scores = model.predict(audio_frame) 8 if scores["hey_robot"] > 0.5: 9 print("Wake word detected!") LiveKit Incorporated also provide a WakeWordListener that handles all the audio capture for you, so you can listen from the microphone without writing any audio code yourself: 1 from livekit.wakeword import WakeWordModel, WakeWordListener 2 3 model = WakeWordModel(models=["hey_robot.onnx"]) 4 5 # captures audio from the microphone and runs detection automatically 6 async with WakeWordListener(model, threshold=0.5) as listener: 7 while True: 8 detection = await listener.wait_for_detection 9 print(f"Detected {detection.name}!") For a complete example that uses wake word detection to spawn a LiveKit agent, check out hello-wakeword. Other runtimes. For production deployments, LiveKit Incorporated currently support Rust. More runtimes are on the roadmap. Future directions. On the hardware side, the current architecture already runs comfortably on single-board computers, but LiveKit Incorporated is taking it further. LiveKit Incorporated is building an end-to-end model that removes the need for a separate embedding model, making it small enough to run directly on ESP32 and other embedded microcontrollers. Want to get involved? Check out the repo and join its developer community to share what you're building.

LiveKit
Mar 19th, 2026
Solving unwanted interruptions with Adaptive Interruption Handling.

Solving unwanted interruptions with Adaptive Interruption Handling. Knowing whose turn it is to speak remains one of the hardest problems in voice AI. You've probably already used its transformer-based End-of-Turn detection that tells the agent when the user is truly done speaking. But what about the other direction? When the agent is talking and the user wants to jump in? The naive solution of stopping the moment the user makes any sound sounds simple on paper. In reality it destroys the conversation. Most voice agents today rely on simple Voice Activity Detection (VAD) when speech is detected during the agent's turn. But VAD alone is not sufficient, because many sounds can trigger it: brief backchannels ("mm-hmm", "yeah"), user noises like sighs or coughs, or background sounds like typing, music or chatter. Treat every one of those as a full interruption and your agent becomes jittery and robotic. LiveKit Incorporated has been focused on solving this properly for a long time. Today LiveKit Incorporated is excited to announce Adaptive Interruption Handling is generally available in LiveKit Agents. How Adaptive Interruption Handling works. LiveKit Incorporated trained a brand-new audio-based interruption detection model specifically for this problem. When user speech is detected during the agent's turn, the model analyzes the user's audio stream within the first few hundred milliseconds of detected speech. It looks for distinctive acoustic characteristics of true interruptions, including: * Overall waveform shape * Strength and sharpness of speech onset * Duration of the signal * Prosodic features such as pitch and rhythm This allows it to quickly determine whether the user is beginning a new utterance or just making incidental sounds. To learn these patterns, the model was trained on examples of real interruptions, backchannels, and other sounds extracted from natural one-on-one conversations. It learned to discriminate between genuine attempts to interrupt and incidental speech or noise that would normally trigger a simple VAD. Architecturally, the system combines an audio encoder with a convolutional neural network (CNN) to extract and analyze acoustic patterns in the waveform. This design enables the model to identify the signatures of true interruptions while ignoring non-interruptive sounds, resulting in more natural and responsive conversations with the agent. The data challenge. In order to teach the model how to behave more human-like, LiveKit Incorporated needed to gather a diverse set of conversations that capture natural back-channeling and barge-ins. This kind of data is very sparse in human-agent conversations, because most voice agents today simply aren't able to handle it correctly. Instead, LiveKit Incorporated turned to human-to-human conversations. Its team went on a full data-gathering mission and collected hundreds of hours of real human speech across many different topics and language. The raw audio then went through a data enrichment pipeline, mixing in a variety of noises to simulate the real-world diversity of inputs LiveKit Incorporated expect to see. One particularly exciting outcome is that the model is multilingual and generalizes effectively to languages it has never seen before. Rather than simply memorizing patterns from the training data, it has learned the underlying conversational dynamics and can infer correctly in new scenarios. Benchmarks. LiveKit Incorporated evaluated the model on a held-out dataset and observed strong results in production: * 86% precision and 100% recall (at 500 ms overlap speech) * Rejects 51% of VAD-based barge-ins (false positives avoided) * Detects true barge-ins faster than VAD in 64% of cases * Completes inference in 30 ms or less * Median audio duration needed to trigger interruption: 216 ms * Consistent strong performance across noisy environments and multiple languages Using Adaptive Interruption Handling. The model is enabled by default in Python Agents v1.5.0+ and TypeScript Agents v1.2.0+. Every agent deployed to LiveKit Cloud gets it automatically with no extra models to deploy or manage. To fall back to classic VAD-based interruption detection, use the new turn_handling config: 1 session = AgentSession( 2 ... 3 turn_handling=TurnHandlingOptions( 4 interruption={ 5 "mode": "vad", 6 }, 7 ), 8 ) TypeScript 1 const session = new AgentSession({ 2 interruption: { 3 mode: "vad", 4 }, 5 }) When interruption.mode is not specified, it defaults to "adaptive" on LiveKit Cloud or in dev mode. This model is deployed directly in LiveKit Cloud data centers for optimal inference latency. LiveKit Incorporated has been building a family of models that significantly improve conversational flow. They're trained on proprietary data, tend to be larger, and are optimized for GPU inference, making them impractical to bundle into agent containers. Adaptive Interruption Handling is included at no extra cost for all agents deployed to LiveKit Cloud. For local development and testing, every plan includes 40,000 free inference requests per month. Try it today. Adaptive Interruption Handling is the missing piece that makes voice agents feel truly conversational instead of polite but robotic. The fastest way to try it is in the Agents Playground. Run an agent on your machine, speak over it, or backchannel and see the difference immediately. LiveKit Incorporated has also added a brand-new debugging panel with clear visualizations that show exactly when the model detects a barge-in versus a back-channel. It makes debugging and tuning extremely straightforward. Give it a try! LiveKit Incorporated'd love to hear what you build and any feedback you have.

LiveKit
Mar 3rd, 2026
Design beautiful voice AI experiences with Agents UI

Design beautiful voice AI experiences with Agents UI. Today LiveKit Incorporated is introducing Agents UI, a component library that lets you build polished multimodal agent interfaces in minutes. These are not placeholder UI kits or sample code. Every component in Agents UI is built for real production use, designed atop shadcn/ui and its proven React conventions, and integrated with the same LiveKit platform you use to ship agents to production. From the very first install, you're building the real thing. How it works. Agents UI components are installed via the shadcn CLI so they land directly in your codebase. That means you can inspect the source, modify behavior, and adjust styling without waiting for library updates or working around API limitations. LiveKit provides built-in controls for audio input and output, audio visualizers, session lifecycle management, and chat interactions out of the box. You can customize every component with Tailwind CSS classes, extend them to match your brand, and modify the source directly since it lives in your repo. Every component can be themed, so it's easy to express your brand's unique style. Built on familiar primitives. Agents UI builds on top of shadcn and AI Elements, combining well-known React patterns with components tailored specifically to voice agent workflows. One highlight is the aura audio visualizer, designed in partnership with Unicorn Studio. It's a shader-based visualizer that responds smoothly to live audio, is more expressive than generic volume bars, and fast enough to run comfortably in realtime apps. Agents UI focuses on the patterns that repeat across every voice agent project: * Media controls for managing audio and video input * Audio visualizers so users know the agent is listening or speaking * Session management for handling agent state and lifecycle * Chat and transcript views for text interaction and message history Designed to grow with your product. Agents UI is not a UI framework that locks you in. It is designed to give you ownership, and LiveKit Incorporated is investing in patterns that make the transition from first component to fully custom interface seamless. Components live in your repository, so you can extend them as your product evolves. Add custom hooks, integrate with your state management, or fork components entirely - the code is yours. As LiveKit Incorporated learn from production use cases, LiveKit Incorporated'll add new components and patterns based on real-world feedback. What's next. Agents UI is not meant to be the only way to build agent interfaces, but it is a core part of how LiveKit Incorporated improve the developer experience of building on LiveKit. Upcoming additions include: * Additional voice and video control components * More audio visualizer styles and customization options * Advanced session state management patterns * Templates for common agent interface layouts Its goal is to package the patterns that work in production, then give you the flexibility to customize them for whatever your product requires. Start building with Agents UI today. Agents UI is available now. Getting started takes just a few steps. First, if you haven't set up shadcn, run: npx shadcn@latest init Then add the Agents UI registry with: npx shadcn@latest registry add @agents-ui npx shadcn@latest add @agents-ui/{component-name} Most Agents UI components require a LiveKit session object. Create one from a TokenSource and wrap your components in an AgentSessionProvider: 'use client'; import {useAgent, useSession, useSessionContext, useSessionMessages,} from '@livekit/components-react'; import {AgentSessionProvider} from '@/components/agents-ui/agent-session-provider'; import {AgentControlBar} from '@/components/agents-ui/agent-control-bar'; import {AgentAudioVisualizerAura} from '@/components/agents-ui/agent-audio-visualizer-aura';import {AgentChatTranscript} from '@/components/agents-ui/agent-chat-transcript'; import {AgentChatTranscript} from '@/components/agents-ui/agent-chat-transcript'; const TOKEN_SOURCE = TokenSource.sandboxTokenServer( process.env.NEXT_PUBLIC_SANDBOX_TOKEN_SERVER_ID); export default function DemoWrapper({ session}) {const session = useSession(TOKEN_SOURCE); return ( <AgentSessionProvider session={session}> <Demo /> </AgentSessionProvider>);} export function Demo {const session = useSessionContext; const { messages} = useSessionMessages(session); const {audioTrack, state} = useAgent; return (<> <AgentChatTranscript agentState={state} messages={messages} /> <AgentAudioVisualizerAura size="xl" state={state} color="#1FD5F9" colorShift={0.1} themeMode={{themeMode}} audioTrack={audioTrack} /> <AgentControlBar variant="default" isChatOpen={false} isConnected={true} controls={['microphone', 'camera', 'chat']} /> </>);} That's it. The components are now part of your app and fully customizable with Tailwind CSS. Check out the full documentation and source code in the GitHub repository. For a complete example application, see agent-starter-react. Build your own shader-based audio visualizers that react to voice and agent state in realtime with its custom visualizer tutorial. If you've been building agent interfaces from scratch or patching together UI libraries that weren't designed for realtime voice, Agents UI is the fastest way to ship production-quality components and set yourself up to customize as you scale. Try it today and let LiveKit Incorporated know how it goes.

Rocketnews
Jan 22nd, 2026
Voice AI engine and OpenAI partner LiveKit hits $1B valuation

Voice AI engine and OpenAI partner LiveKit hits $1B valuation. LiveKit, a developer of infrastructure software for real-time AI voice and video applications, has announced the raise of $100 million in funding at a $1 billion valuation. The round, which comes 10 months after LiveKit's previous fundraise, was led by Index Ventures with participation from existing investors, including Altimeter Capital Management, Hanabi Capital, and Redpoint Ventures. LiveKit powers OpenAI's ChatGPT voice mode. The startup's other customers include xAI, Salesforce, Tesla, as well as 911 emergency service operators and mental health providers. The company was founded in 2021 by Russ d'Sa and David Zhao as an open source software project for building apps that can transmit real-time audio and video without interruptions, in an era when the whole world was meeting on Zoom during the pandemic. Although LiveKit began as a free developer tool, the business took off after the founders realized big companies wanted a managed cloud version and began providing those services to enterprises amid the voice AI boom.