Full-Time
Cloud-based on-demand code execution platform
$300k - $350k/yr
New York, NY, USA
In Person
Modal provides on-demand cloud compute for developers, data engineers, and ML practitioners. Users write Python and launch hundreds of custom containers in the cloud to run code and data workloads without managing infrastructure, with on-demand GPUs and serverless web endpoints. It charges for compute resources and offers features like defining environments in code, fast container startup, monitoring, logs, and distributed queues. It differentiates by a Python-centric workflow, rapid container startup, and end-to-end cloud execution, aiming to simplify running code in the cloud at scale.
Company Size
51-200
Company Stage
Series C
Total Funding
$483M
Headquarters
New York City, New York
Founded
2021
Help us improve and share your feedback! Did you find this helpful?
People at Modal who can refer or advise you
Health Insurance
Unlimited Paid Time Off
Remote Work Options
Paid Vacation
Flexible Work Hours
401(k) Retirement Plan
401(k) Company Match
Wellness Program
Mental Health Support
Gym Membership
Phone/Internet Stipend
Home Office Stipend
Professional Development Budget
Conference Attendance Budget
Stock Options
Company Equity
Parenting Leave
Family Planning Benefits
Fertility Treatment Support
Adoption Assistance
Relocation Assistance
Commuter Benefits
Employee Referral Bonus
Training Programs
Tuition Reimbursement
Professional Certification Support
Mentorship Program
Meal Benefits
Legal Services
Employee Discounts
Company Social Events
Modal Labs raises $355M at $4.65B valuation as AI agent sandboxes become critical infrastructure. Modal closed a $355M Series C led by General Catalyst and Redpoint on May 21, crossing $300M ARR after 5x growth in eight months. Its sandbox product - which powers code execution for Devin, Windsurf, and several Claude Managed Agent partners - now accounts for more than a third of revenue. Modal Labs announced a $355 million Series C on May 21, raising its post-money valuation to $4.65 billion. General Catalyst and Redpoint led the round, with Redpoint taking a board seat. Accel, Menlo Ventures, and Bain Capital Ventures also participated, along with all existing investors doubling down. The company hit over $300 million in annualized revenue, up 5x from around $60 million at its Series B close in September 2025. That's eight months of growth. What Modal actually does. Modal is a serverless GPU platform. You write Python, decorate a function with @app.function, and Modal runs it on the appropriate GPU, scales it to zero when idle, and handles the rest. The initial pitch was infrastructure for ML engineers who don't want to manage cloud instances. That market is real, but a different product line is now driving more than a third of revenue: sandboxes. A Modal Sandbox is an isolated container environment for running untrusted code. The platform has launched over one billion of them. AI coding agents need this. When Devin runs a user's code, that code can't execute on shared production infrastructure. When a Claude Managed Agent builds a web app, it needs somewhere to run and test it. When an RL training run generates thousands of code samples to evaluate, each evaluation needs its own environment. Scott Wu, CEO of Cognition (which makes Devin and Windsurf), put it plainly: "Modal powers both our reinforcement learning infrastructure and production inference. Millions of sandboxes on one end, real-time serving on the other." DoorDash CTO Andy Fang also cited Modal by name when Anthropic announced Claude Managed Agents self-hosted sandboxes at Code with Claude London on May 19, two days before this funding announcement: "As we scale agentic commerce for local businesses, we need a highly efficient path to production with full harness control, scale, and reliability." DoorDash is one of four launch partners running Claude Managed Agents on Modal infrastructure. Why this round is large. The $355M in a single round for a 120-person company across New York, San Francisco, and Stockholm is notable. Modal's CEO Erik Bernhardsson has said the next phase involves low-latency inference scaling and collapsing training-inference loops for reinforcement learning workflows - both GPU-intensive. The round also expands the sandboxing infrastructure that's increasingly load-bearing for the agentic software engineering stack. A billion sandboxes isn't a marketing number. It reflects how many discrete code-execution events AI agents have generated in roughly 18 months of this product category existing. Each one is a separate Modal API call. At the current trajectory of agent deployments, that number compounds quickly. The infrastructure layer. Modal sits a level below the AI coding tools that get most of the attention. Cursor, Windsurf, and Claude Code are the interfaces. The underlying execution environments that make them safe to run at scale are less visible. This round is a signal that the infrastructure layer has gotten large and is accelerating. The round closed in two tranches: a first close at $2.5 billion valuation and the larger second tranche at $4.65 billion.
Runway chooses Modal to power real-time inference for Runway Characters. Today, Modal Labs is announcing that Runway is partnering with Modal to power real-time inference for Runway Characters. Runway Characters is a real-time video agent API that lets developers, startups, enterprises and consumers build fully custom conversational characters. These video agents can have any appearance and any visual style, with full control over voice, personality, knowledge and actions. Built on Runway's general world model, GWM-1, Characters generates expressive digital personas from a single image, with zero fine-tuning required. Thousands of organizations are already using Characters, including Fortune 10 technology companies, major Hollywood studios, global advertising agencies and gaming companies, with use cases ranging from customer support and internal training to experiential advertising and immersive game worlds. Characters represents the first step toward a future of online interaction built around real-time video rather than text. This kind of continuous, expressive, low-latency video generation held across extended conversations and experiences requires infrastructure purpose-built for real-time interaction. Modal's serverless compute platform is designed for exactly this type of workload: GPU-intensive, latency-critical and highly variable in demand. The iteration speeds Modal afforded allowed Runway's team to move from proof of concept to production in under 30 days. "Real-time video inference is a fundamentally different engineering challenge than batch generation, especially given our customers are running these experiences globally," said Kamil Sindi, CTO of Runway. "Runway Characters requires sustained low latency across the full duration of a conversation - expressions, lip-sync, gestures - without degradation. Modal's infrastructure gave us the performance and reliability we need to ship this in every global region, at production scale." Achieving the latency required for real-time interaction means distributing inference across multiple GPUs with high-bandwidth communication between nodes. By adding a single line of code on Modal, Runway can turn their containers into multi-node GPU clusters with RDMA networking, available instantly across every region. Modal deploys these workloads across geographies as a single unified pool, routing them close to users and scaling on demand, so Runway can serve users anywhere without pre-provisioning or managing regional infrastructure directly. "Runway is pushing the frontier for what's possible with world models, which requires running complex models at large scale with very low latency. This is something Modal does extremely well," said Erik Bernhardsson, CEO of Modal. "We're proud to be the infrastructure powering Characters." Runway Characters is available today to all developers and businesses at dev.runwayml.com, and to consumers at runwayml.com. Enterprise teams can reach out to learn more about deploying custom avatar experiences at scale. Ship your first app in minutes. $30 / month free compute
Modal + Mistral 3: 10x faster cold starts with GPU snapshotting. Today, Mistral launched Mistral 3, a family of open models with frontier-class performance, customization capabilities, and trusted transparency. Modal Labs is proud to offer Day 0 support for running these models on Modal. Modal enables developers to instantly deploy and scale Mistral 3 models without orchestrating compute infrastructure. Beyond a great DevEx and abundant GPU capacity, Modal also offers cutting-edge features like GPU memory snapshotting that can reduce median cold start time for some of these models by almost 10x, from almost two minutes to just ten seconds. About Mistral 3. Mistral 3 is the newest frontier open model family from Mistral. It is a suite of multimodal models with strong multilingual support, and it is available in multiple sizes and capabilities for max flexibility. This blog post focuses on Ministral 3, whose size is well-suited for Modal's serverless infrastructure. Ministral 3 is the small version of the Mistral 3 family of models and is available in 3B, 8B, and 14B sizes. It performs competitively with the Qwen 3-VL model series on benchmarks. This makes Ministral 3 well-suited for companies that are seeking a balance of intelligence and compute efficiency. See the example text for details on deployment with modal deploy. How it works. The basic example above takes advantage of several key Modal features: * Serverless GPUs that automatically scale up and down from 0 based on request volume to the vLLM server. * Volumes, Modal's native, distributed file system, to cache model weights and compilation artifacts from vLLM. * Python-defined infrastructure to keep environment and hardware requirements cleanly in sync with application code. Together, these features allow developers to deploy Mistral 3 without being blocked on acquiring GPU quota or managing complex configuration surface areas. Now, speed up cold starts by almost 10x. Modal recently launched a new GPU snapshotting feature in alpha. This can drastically reduce cold starts for workloads that require heavy initialization work - like spinning up a vLLM server. Modal Labs tested this on the 3B version of Ministral 3 and saw an almost 10x reduction in median cold start time, from ~118s to ~12s. Drastically shorter cold starts means you can deploy Ministral 3 in a way that is both cost-efficient and responsive to user demand. vLLM + Ministral 3 3B. To use this feature, you must enable Sleep Mode for your vLLM server and set experimental_options={"enable_gpu_snapshot": True} in your Modal App. The first time the vLLM server finishes initializing, it will be put to sleep. This shifts most of the contents of GPU memory to CPU memory, which facilitates the snapshotting process. Upon subsequent starts, the vLLM server is restored from this snapshot. Try it out for yourself by deploying the code sample here.
Modal Labs, an AI infrastructure startup, has raised $80 million in funding, as reported by Bloomberg.com.
Modal, a Swedish-founded company, has raised over $80 million in a funding round led by Lux Capital, with participation from existing investors. The company aims to provide AI firms with access to vast computing power through a global, serverless platform, eliminating the need for companies to manage expensive GPUs. The new funds will be used to expand their product offerings. Modal, founded four years ago, has now raised a total of $111 million. Earlier this year, Modal acquired the data startup Twirl.