Full-Time

Customer Engineer

Updated on 6/3/2026

Modal

Modal

51-200 employees

Cloud-based on-demand code execution platform

Compensation Overview

$150k - $220k/yr

San Francisco, CA, USA + 1 more

More locations: New York, NY, USA

In Person

Category
Sales & Solution Engineering (1)
Required Skills
Machine Learning
Computer Networking
Operating Systems
REST APIs
Requirements
  • Accomplished in key areas. You bring depth in either low-level infrastructure or machine learning and artificial intelligence, and you're not lost in the other.
  • Low-level infrastructure experience. Operating systems, file systems, networking, performance profiling, cluster management and distributed systems.
  • AI/ML engineering experience. Training models, optimizing inference, working with GPUs, or building ML infrastructure.
  • Automation mindset. Your instinct when you see a manual process is to eliminate it and you have the engineering background to make that happen.
  • Clear communicator. Can explain a systems issue to a customer, write a crisp bug report, and draft documentation, all while collaborating internally to ship improvements.
Responsibilities
  • Ship code that matters. Fix bugs, build features, and create automation that improves the experience for every Modal user — not just the one who reported the issue.
  • Work directly with customers. Help developers and ML engineers debug, optimize, and architect their workloads across Slack, email, and calls.
  • Build scalable systems. Design tooling, dashboards, and automated workflows that make support efficient at scale — delighting customers at the most important moments.
  • Close the feedback loop. Translate patterns you see in the field into concrete improvements — docs fixes, API changes, or new feature proposals.
  • Contribute to open source and technical content. Write examples, build demos, and publish content that helps the broader community succeed on Modal.

Modal provides on-demand cloud compute for developers, data engineers, and ML practitioners. Users write Python and launch hundreds of custom containers in the cloud to run code and data workloads without managing infrastructure, with on-demand GPUs and serverless web endpoints. It charges for compute resources and offers features like defining environments in code, fast container startup, monitoring, logs, and distributed queues. It differentiates by a Python-centric workflow, rapid container startup, and end-to-end cloud execution, aiming to simplify running code in the cloud at scale.

Company Size

51-200

Company Stage

Series C

Total Funding

$483M

Headquarters

New York City, New York

Founded

2021

Simplify Jobs

Simplify's Take

What believers are saying

  • Sandbox revenue exceeded one-third, with over one billion sandboxes launched.[1]
  • Runway adopted Modal for real-time inference and shipped Characters in under 30 days.[4]
  • Anthropic, DoorDash, and Blend validate enterprise demand for secure agent runtimes.[4]

What critics are saying

  • AWS, Google Cloud, and Azure can bundle adjacent infrastructure and compress pricing.[1][3]
  • Anthropic's managed sandboxes directly compete with Modal's fastest-growing sandbox product.[4]
  • Hyperscalers can replicate snapshotting and autoscaling, weakening Modal's differentiation.[1][4]

What makes Modal unique

  • Serverless GPU platform with Python-native infrastructure and instant container launch.[3][4]
  • Sandboxes isolate untrusted code, serving AI agents and code-execution workflows.[1][3]
  • Owns storage and compute stack, enabling GPU snapshotting and millisecond cold starts.[1][4]

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Modal who can refer or advise you

Benefits

Health Insurance

Unlimited Paid Time Off

Remote Work Options

Paid Vacation

Flexible Work Hours

401(k) Retirement Plan

401(k) Company Match

Wellness Program

Mental Health Support

Gym Membership

Phone/Internet Stipend

Home Office Stipend

Professional Development Budget

Conference Attendance Budget

Stock Options

Company Equity

Parenting Leave

Family Planning Benefits

Fertility Treatment Support

Adoption Assistance

Relocation Assistance

Commuter Benefits

Employee Referral Bonus

Training Programs

Tuition Reimbursement

Professional Certification Support

Mentorship Program

Meal Benefits

Legal Services

Employee Discounts

Company Social Events

Growth & Insights and Company News

Headcount

6 month growth

-1%

1 year growth

2%

2 year growth

2%
Vibe Coded This
May 26th, 2026
Modal Labs raises $355M at $4.65B valuation as AI agent sandboxes become critical infrastructure.

Modal Labs raises $355M at $4.65B valuation as AI agent sandboxes become critical infrastructure. Modal closed a $355M Series C led by General Catalyst and Redpoint on May 21, crossing $300M ARR after 5x growth in eight months. Its sandbox product - which powers code execution for Devin, Windsurf, and several Claude Managed Agent partners - now accounts for more than a third of revenue. Modal Labs announced a $355 million Series C on May 21, raising its post-money valuation to $4.65 billion. General Catalyst and Redpoint led the round, with Redpoint taking a board seat. Accel, Menlo Ventures, and Bain Capital Ventures also participated, along with all existing investors doubling down. The company hit over $300 million in annualized revenue, up 5x from around $60 million at its Series B close in September 2025. That's eight months of growth. What Modal actually does. Modal is a serverless GPU platform. You write Python, decorate a function with @app.function, and Modal runs it on the appropriate GPU, scales it to zero when idle, and handles the rest. The initial pitch was infrastructure for ML engineers who don't want to manage cloud instances. That market is real, but a different product line is now driving more than a third of revenue: sandboxes. A Modal Sandbox is an isolated container environment for running untrusted code. The platform has launched over one billion of them. AI coding agents need this. When Devin runs a user's code, that code can't execute on shared production infrastructure. When a Claude Managed Agent builds a web app, it needs somewhere to run and test it. When an RL training run generates thousands of code samples to evaluate, each evaluation needs its own environment. Scott Wu, CEO of Cognition (which makes Devin and Windsurf), put it plainly: "Modal powers both our reinforcement learning infrastructure and production inference. Millions of sandboxes on one end, real-time serving on the other." DoorDash CTO Andy Fang also cited Modal by name when Anthropic announced Claude Managed Agents self-hosted sandboxes at Code with Claude London on May 19, two days before this funding announcement: "As we scale agentic commerce for local businesses, we need a highly efficient path to production with full harness control, scale, and reliability." DoorDash is one of four launch partners running Claude Managed Agents on Modal infrastructure. Why this round is large. The $355M in a single round for a 120-person company across New York, San Francisco, and Stockholm is notable. Modal's CEO Erik Bernhardsson has said the next phase involves low-latency inference scaling and collapsing training-inference loops for reinforcement learning workflows - both GPU-intensive. The round also expands the sandboxing infrastructure that's increasingly load-bearing for the agentic software engineering stack. A billion sandboxes isn't a marketing number. It reflects how many discrete code-execution events AI agents have generated in roughly 18 months of this product category existing. Each one is a separate Modal API call. At the current trajectory of agent deployments, that number compounds quickly. The infrastructure layer. Modal sits a level below the AI coding tools that get most of the attention. Cursor, Windsurf, and Claude Code are the interfaces. The underlying execution environments that make them safe to run at scale are less visible. This round is a signal that the infrastructure layer has gotten large and is accelerating. The round closed in two tranches: a first close at $2.5 billion valuation and the larger second tranche at $4.65 billion.

Modal
Mar 26th, 2026
Runway chooses Modal to power real-time inference for Runway Characters.

Runway chooses Modal to power real-time inference for Runway Characters. Today, Modal Labs is announcing that Runway is partnering with Modal to power real-time inference for Runway Characters. Runway Characters is a real-time video agent API that lets developers, startups, enterprises and consumers build fully custom conversational characters. These video agents can have any appearance and any visual style, with full control over voice, personality, knowledge and actions. Built on Runway's general world model, GWM-1, Characters generates expressive digital personas from a single image, with zero fine-tuning required. Thousands of organizations are already using Characters, including Fortune 10 technology companies, major Hollywood studios, global advertising agencies and gaming companies, with use cases ranging from customer support and internal training to experiential advertising and immersive game worlds. Characters represents the first step toward a future of online interaction built around real-time video rather than text. This kind of continuous, expressive, low-latency video generation held across extended conversations and experiences requires infrastructure purpose-built for real-time interaction. Modal's serverless compute platform is designed for exactly this type of workload: GPU-intensive, latency-critical and highly variable in demand. The iteration speeds Modal afforded allowed Runway's team to move from proof of concept to production in under 30 days. "Real-time video inference is a fundamentally different engineering challenge than batch generation, especially given our customers are running these experiences globally," said Kamil Sindi, CTO of Runway. "Runway Characters requires sustained low latency across the full duration of a conversation - expressions, lip-sync, gestures - without degradation. Modal's infrastructure gave us the performance and reliability we need to ship this in every global region, at production scale." Achieving the latency required for real-time interaction means distributing inference across multiple GPUs with high-bandwidth communication between nodes. By adding a single line of code on Modal, Runway can turn their containers into multi-node GPU clusters with RDMA networking, available instantly across every region. Modal deploys these workloads across geographies as a single unified pool, routing them close to users and scaling on demand, so Runway can serve users anywhere without pre-provisioning or managing regional infrastructure directly. "Runway is pushing the frontier for what's possible with world models, which requires running complex models at large scale with very low latency. This is something Modal does extremely well," said Erik Bernhardsson, CEO of Modal. "We're proud to be the infrastructure powering Characters." Runway Characters is available today to all developers and businesses at dev.runwayml.com, and to consumers at runwayml.com. Enterprise teams can reach out to learn more about deploying custom avatar experiences at scale. Ship your first app in minutes. $30 / month free compute

Modal
Dec 2nd, 2025
Modal + Mistral 3: 10x faster cold starts with GPU snapshotting

Modal + Mistral 3: 10x faster cold starts with GPU snapshotting. Today, Mistral launched Mistral 3, a family of open models with frontier-class performance, customization capabilities, and trusted transparency. Modal Labs is proud to offer Day 0 support for running these models on Modal. Modal enables developers to instantly deploy and scale Mistral 3 models without orchestrating compute infrastructure. Beyond a great DevEx and abundant GPU capacity, Modal also offers cutting-edge features like GPU memory snapshotting that can reduce median cold start time for some of these models by almost 10x, from almost two minutes to just ten seconds. About Mistral 3. Mistral 3 is the newest frontier open model family from Mistral. It is a suite of multimodal models with strong multilingual support, and it is available in multiple sizes and capabilities for max flexibility. This blog post focuses on Ministral 3, whose size is well-suited for Modal's serverless infrastructure. Ministral 3 is the small version of the Mistral 3 family of models and is available in 3B, 8B, and 14B sizes. It performs competitively with the Qwen 3-VL model series on benchmarks. This makes Ministral 3 well-suited for companies that are seeking a balance of intelligence and compute efficiency. See the example text for details on deployment with modal deploy. How it works. The basic example above takes advantage of several key Modal features: * Serverless GPUs that automatically scale up and down from 0 based on request volume to the vLLM server. * Volumes, Modal's native, distributed file system, to cache model weights and compilation artifacts from vLLM. * Python-defined infrastructure to keep environment and hardware requirements cleanly in sync with application code. Together, these features allow developers to deploy Mistral 3 without being blocked on acquiring GPU quota or managing complex configuration surface areas. Now, speed up cold starts by almost 10x. Modal recently launched a new GPU snapshotting feature in alpha. This can drastically reduce cold starts for workloads that require heavy initialization work - like spinning up a vLLM server. Modal Labs tested this on the 3B version of Ministral 3 and saw an almost 10x reduction in median cold start time, from ~118s to ~12s. Drastically shorter cold starts means you can deploy Ministral 3 in a way that is both cost-efficient and responsive to user demand. vLLM + Ministral 3 3B. To use this feature, you must enable Sleep Mode for your vLLM server and set experimental_options={"enable_gpu_snapshot": True} in your Modal App. The first time the vLLM server finishes initializing, it will be put to sleep. This shifts most of the contents of GPU memory to CPU memory, which facilitates the snapshotting process. Upon subsequent starts, the vLLM server is restored from this snapshot. Try it out for yourself by deploying the code sample here.

Google
Sep 29th, 2025
Modal Labs Raises $80M in Funding

Modal Labs, an AI infrastructure startup, has raised $80 million in funding, as reported by Bloomberg.com.

Breakit
Sep 29th, 2025
Modal raises over 800 million SEK

Modal, a Swedish-founded company, has raised over $80 million in a funding round led by Lux Capital, with participation from existing investors. The company aims to provide AI firms with access to vast computing power through a global, serverless platform, eliminating the need for companies to manage expensive GPUs. The new funds will be used to expand their product offerings. Modal, founded four years ago, has now raised a total of $111 million. Earlier this year, Modal acquired the data startup Twirl.