Full-Time
Posted on 7/29/2025
AI workload optimization across heterogeneous hardware
$150k - $350k/yr
San Francisco, CA, USA
In Person
Gimlet Labs provides a platform that optimizes AI workloads by separating software from the underlying hardware. The system works by breaking AI tasks into components and automatically mapping them to the most efficient mix of GPUs, CPUs, and accelerators without requiring developers to rewrite their code. Unlike competitors that often lock users into specific hardware brands, Gimlet uses a hardware-agnostic compiler and autonomous kernel generation to run applications across different vendors like NVIDIA, Intel, and AMD. The company's goal is to reduce the cost and complexity of running AI by creating a flexible, high-performance infrastructure for both data centers and cloud developers.
Company Size
11-50
Company Stage
Series A
Total Funding
$92M
Headquarters
San Francisco, California
Founded
2023
Help us improve and share your feedback! Did you find this helpful?
Flexible Work Hours
Multichip inference cloud startup Gimlet Labs receives $80M to solve one of AI's biggest bottlenecks. Gimlet Labs Inc. said today it has raised $80 million in early-stage funding to solve a bottleneck that's holding back artificial intelligence inference. The startup, which has raised $92 million in total, has created what's said to be the world's first and only "multi-silicon inference cloud." It differs from standard inference clouds, because it enables AI workloads to be run simultaneously across various kinds of chips. For instance, an AI application's work can be split across traditional central processing units, high-performance graphics processing units, and other kinds of processors. Inference is the process of using a trained machine learning model to make predictions or decisions on new, unseen data, turning AI into action. Menlo Ventures partner Tim Tully discussed in a blog post why multi-silicon inference is so useful. He explained that when an autonomous AI agent is assigned a task, it may "chain together dozens of model calls, retrieval steps and tool invocations across non-linear branching logic." Each step in this chain is best performed by different hardware. For instance, prefill is compute-bound, decode is memory-bound and tool calls are network-bound. "No single chip can handle all three efficiently. Instead, the answer is heterogeneous," Tully said. Compute-intensive batch inference is best done using GPUs, while latency-sensitive workloads can benefit from running on specialized static random-access memory-heavy processors such as Groq, Cerebras and d-Matrix, as these deliver exceptional speed. Tasks such as orchestration and tool use, on the other hand, generally run better on CPUs. "The multi-silicon fleet is ready - it's just missing the software layer to make it work," Tully said. By splitting up AI tasks across multiple processors in this way, Gimlet Labs says, it can dramatically improve efficiency and reduce the time chips spend sitting idle, waiting for instructions. It reckons that it can speed up inference workloads by anywhere from three to 10 times for the same cost and power. It can even slice up AI models themselves, so that different parts of them run on different chips. Gimlet Labs founder and Chief Executive Zain Asgar told TechCrunch in an interview that existing hardware generally runs only at between 15% and 30% efficiency. "You're wasting hundreds of billions of dollars because you're just leaving idle resources," he said. "Our goal was basically to try to figure out how you can get AI workloads to be 10 times more efficient than ever, today." The startup's software is not aimed at regular rank-and-file developers. Instead, Gimlet Labs is going after the big boys who run the largest AI model labs and the most expansive data centers. Its partners include some of the biggest chipmakers, including Nvidia Corp., Advanced Micro Devices Inc., Intel Corp., Arm Holdings Plc and Cerebras Systems Inc. Asgar told TechCrunch that the company is already generating revenue of eight figures despite only launching its platform in October. In the last four months it has doubled its customer base, with clients including a major model maker and an "extremely large" cloud computing company, he said. The Series A round was led by Menlo Ventures and saw participation from Factory, which led the company's seed, as well as Eclipse, Prosperity7 and Triamtomic. Today's funding round is all about giving Gimlet Labs the resources it needs to scale and ensure high-speed, efficient multichip inference becomes the norm. With that in mind, the startup is planning to expand its team and grow its inference cloud to meet the rapidly growing demand for faster inference. Photo: Gimlet Labs. A message from John Furrier, co-founder of SiliconANGLE: Support its mission to keep content open and free by engaging with theCUBE community. Join theCUBE's Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities. * 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more * 11.4k+ theCUBE alumni - Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network. About SiliconANGLE Media SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios - with flagship locations in Silicon Valley and the New York Stock Exchange - SiliconANGLE Media operates at the intersection of media, technology and AI. Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Its new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.
Gimlet Labs, a startup founded by Stanford adjunct professor Zain Asgar, has raised $80 million in a Series A round led by Menlo Ventures. The company has created what it claims is the first "multi-silicon inference cloud", software that allows AI workloads to run simultaneously across diverse hardware types including CPUs, GPUs and high-memory systems. The technology addresses the AI inference bottleneck by splitting workloads across whatever hardware is available, potentially improving efficiency by 3x to 10x. Asgar says existing data centre hardware is only utilised 15-30% of the time, wasting hundreds of billions of dollars. Launched in October with eight-figure revenues, Gimlet has partnered with chip makers including NVIDIA, AMD, Intel and Cerebras. The company has now raised $92 million total and employs 30 people.
Today, we're announcing our $80M Series A raise, led by Menlo Ventures and joined by Eclipse, Factory, Prosperity7, and Triatomic.
d-Matrix and Gimlet Labs have announced a partnership to deliver 10x performance improvements for agentic AI inference workloads. Gimlet Cloud will deploy d-Matrix Corsair accelerators alongside GPUs, achieving significant gains in latency and throughput per watt compared to GPU-only approaches. The solution divides workloads between GPUs and d-Matrix accelerators, with Corsair's memory-optimised architecture handling memory-bound portions of AI models. This is particularly effective for latency-sensitive tasks like speculative decoding used in large-scale AI deployments. Gimlet's software intelligently maps workloads across multiple accelerator types and vendors. The combined solution will be available to select customers through Gimlet Cloud in the second half of 2026. Both companies emphasise power efficiency as crucial for advancing AI infrastructure amid growing energy constraints.
Gimlet Labs Inc. launched with $12M in funding from investors including Factory, Intel CEO Lip-Bu Tan, and others. The startup offers a platform that allows AI models to be ported across different chips without coding, optimizing efficiency and reducing costs. It disaggregates AI workloads into components, deploying each on the most suitable chip. The platform uses a custom compiler for chip-specific optimizations and is generating “8-figure revenues” from Fortune 500 companies and AI providers.