Full-Time
Cloud-based on-demand code execution platform
$150k - $270k/yr
New York, NY, USA
In Person
Modal provides on-demand cloud compute for developers, data engineers, and ML practitioners. Users write Python and launch hundreds of custom containers in the cloud to run code and data workloads without managing infrastructure, with on-demand GPUs and serverless web endpoints. It charges for compute resources and offers features like defining environments in code, fast container startup, monitoring, logs, and distributed queues. It differentiates by a Python-centric workflow, rapid container startup, and end-to-end cloud execution, aiming to simplify running code in the cloud at scale.
Company Size
51-200
Company Stage
Series B
Total Funding
$128M
Headquarters
New York City, New York
Founded
2021
Help us improve and share your feedback! Did you find this helpful?
People at Modal who can refer or advise you
Health Insurance
Unlimited Paid Time Off
Remote Work Options
Paid Vacation
Flexible Work Hours
401(k) Retirement Plan
401(k) Company Match
Wellness Program
Mental Health Support
Gym Membership
Phone/Internet Stipend
Home Office Stipend
Professional Development Budget
Conference Attendance Budget
Stock Options
Company Equity
Parenting Leave
Family Planning Benefits
Fertility Treatment Support
Adoption Assistance
Relocation Assistance
Commuter Benefits
Employee Referral Bonus
Training Programs
Tuition Reimbursement
Professional Certification Support
Mentorship Program
Meal Benefits
Legal Services
Employee Discounts
Company Social Events
Runway chooses Modal to power real-time inference for Runway Characters. Today, Modal Labs is announcing that Runway is partnering with Modal to power real-time inference for Runway Characters. Runway Characters is a real-time video agent API that lets developers, startups, enterprises and consumers build fully custom conversational characters. These video agents can have any appearance and any visual style, with full control over voice, personality, knowledge and actions. Built on Runway's general world model, GWM-1, Characters generates expressive digital personas from a single image, with zero fine-tuning required. Thousands of organizations are already using Characters, including Fortune 10 technology companies, major Hollywood studios, global advertising agencies and gaming companies, with use cases ranging from customer support and internal training to experiential advertising and immersive game worlds. Characters represents the first step toward a future of online interaction built around real-time video rather than text. This kind of continuous, expressive, low-latency video generation held across extended conversations and experiences requires infrastructure purpose-built for real-time interaction. Modal's serverless compute platform is designed for exactly this type of workload: GPU-intensive, latency-critical and highly variable in demand. The iteration speeds Modal afforded allowed Runway's team to move from proof of concept to production in under 30 days. "Real-time video inference is a fundamentally different engineering challenge than batch generation, especially given our customers are running these experiences globally," said Kamil Sindi, CTO of Runway. "Runway Characters requires sustained low latency across the full duration of a conversation - expressions, lip-sync, gestures - without degradation. Modal's infrastructure gave us the performance and reliability we need to ship this in every global region, at production scale." Achieving the latency required for real-time interaction means distributing inference across multiple GPUs with high-bandwidth communication between nodes. By adding a single line of code on Modal, Runway can turn their containers into multi-node GPU clusters with RDMA networking, available instantly across every region. Modal deploys these workloads across geographies as a single unified pool, routing them close to users and scaling on demand, so Runway can serve users anywhere without pre-provisioning or managing regional infrastructure directly. "Runway is pushing the frontier for what's possible with world models, which requires running complex models at large scale with very low latency. This is something Modal does extremely well," said Erik Bernhardsson, CEO of Modal. "We're proud to be the infrastructure powering Characters." Runway Characters is available today to all developers and businesses at dev.runwayml.com, and to consumers at runwayml.com. Enterprise teams can reach out to learn more about deploying custom avatar experiences at scale. Ship your first app in minutes. $30 / month free compute
Modal + Mistral 3: 10x faster cold starts with GPU snapshotting. Today, Mistral launched Mistral 3, a family of open models with frontier-class performance, customization capabilities, and trusted transparency. Modal Labs is proud to offer Day 0 support for running these models on Modal. Modal enables developers to instantly deploy and scale Mistral 3 models without orchestrating compute infrastructure. Beyond a great DevEx and abundant GPU capacity, Modal also offers cutting-edge features like GPU memory snapshotting that can reduce median cold start time for some of these models by almost 10x, from almost two minutes to just ten seconds. About Mistral 3. Mistral 3 is the newest frontier open model family from Mistral. It is a suite of multimodal models with strong multilingual support, and it is available in multiple sizes and capabilities for max flexibility. This blog post focuses on Ministral 3, whose size is well-suited for Modal's serverless infrastructure. Ministral 3 is the small version of the Mistral 3 family of models and is available in 3B, 8B, and 14B sizes. It performs competitively with the Qwen 3-VL model series on benchmarks. This makes Ministral 3 well-suited for companies that are seeking a balance of intelligence and compute efficiency. See the example text for details on deployment with modal deploy. How it works. The basic example above takes advantage of several key Modal features: * Serverless GPUs that automatically scale up and down from 0 based on request volume to the vLLM server. * Volumes, Modal's native, distributed file system, to cache model weights and compilation artifacts from vLLM. * Python-defined infrastructure to keep environment and hardware requirements cleanly in sync with application code. Together, these features allow developers to deploy Mistral 3 without being blocked on acquiring GPU quota or managing complex configuration surface areas. Now, speed up cold starts by almost 10x. Modal recently launched a new GPU snapshotting feature in alpha. This can drastically reduce cold starts for workloads that require heavy initialization work - like spinning up a vLLM server. Modal Labs tested this on the 3B version of Ministral 3 and saw an almost 10x reduction in median cold start time, from ~118s to ~12s. Drastically shorter cold starts means you can deploy Ministral 3 in a way that is both cost-efficient and responsive to user demand. vLLM + Ministral 3 3B. To use this feature, you must enable Sleep Mode for your vLLM server and set experimental_options={"enable_gpu_snapshot": True} in your Modal App. The first time the vLLM server finishes initializing, it will be put to sleep. This shifts most of the contents of GPU memory to CPU memory, which facilitates the snapshotting process. Upon subsequent starts, the vLLM server is restored from this snapshot. Try it out for yourself by deploying the code sample here.
Modal Labs, an AI infrastructure startup, has raised $80 million in funding, as reported by Bloomberg.com.
Modal, a Swedish-founded company, has raised over $80 million in a funding round led by Lux Capital, with participation from existing investors. The company aims to provide AI firms with access to vast computing power through a global, serverless platform, eliminating the need for companies to manage expensive GPUs. The new funds will be used to expand their product offerings. Modal, founded four years ago, has now raised a total of $111 million. Earlier this year, Modal acquired the data startup Twirl.
Modal, a London-based firm specializing in Industrial Outdoor Storage (IOS), has secured major financing from Apollo-managed credit funds to expand its U.K. IOS portfolio. Partnering with Centerbridge Partners, this deal marks a significant milestone in the IOS sector, highlighting its growth potential. The collaboration aims to optimize facilities near transport hubs and urban centers, addressing increased demand for storage solutions.