Summer 2026

Member of Technical Staff Intern

Updated on 5/12/2026

Gimlet Labs

Gimlet Labs

11-50 employees

AI workload optimization across heterogeneous hardware

Compensation Overview

$50 - $80/hr

San Francisco, CA, USA

In Person

Category
Software Engineering (2)
,
Required Skills
Machine Learning
Requirements
  • Currently pursuing degree in computer science, engineering, or comparable area of study
  • Experience with AI/ML or distributed systems.
Responsibilities
  • Building, deploying and scaling AI systems for production
  • Evaluating and implementing cutting-edge AI research
  • Researching ways to improve model accuracy, performance and efficiency
Desired Qualifications
  • Experience with PyTorch, TensorFlow, ONNX and other AI frameworks
  • Familiarity with distributed systems and orchestration frameworks (e.g., Kubernetes)
  • Software development experience with Python and C++
  • Understanding of the latest AI research and techniques

Gimlet Labs provides a platform that optimizes AI workloads by separating software from the underlying hardware. The system works by breaking AI tasks into components and automatically mapping them to the most efficient mix of GPUs, CPUs, and accelerators without requiring developers to rewrite their code. Unlike competitors that often lock users into specific hardware brands, Gimlet uses a hardware-agnostic compiler and autonomous kernel generation to run applications across different vendors like NVIDIA, Intel, and AMD. The company's goal is to reduce the cost and complexity of running AI by creating a flexible, high-performance infrastructure for both data centers and cloud developers.

Company Size

11-50

Company Stage

Series A

Total Funding

$92M

Headquarters

San Francisco, California

Founded

2023

Simplify Jobs

Simplify's Take

What believers are saying

  • Raised $92M total, including $80M Series A from Menlo Ventures in March 2026, fueling scaling.
  • Achieved eight-figure revenues since October 2025 launch with Fortune 500 and AI lab customers.
  • d-Matrix partnership delivers 10x speed and power gains for agentic inference in H2 2026.

What critics are saying

  • NVIDIA Dynamo optimizes CUDA 2-4x faster on Hopper GPUs, locking users into single-vendor stacks within 6-12 months.
  • OpenAI Triton 3.0 natively supports AMD/Intel via Microsoft, erasing Gimlet's efficiency moat in 6-12 months.
  • Cerebras CS-3 bundles end-to-end inference software, capturing Fortune 500 deals with 20x throughput in 9-15 months.

What makes Gimlet Labs unique

  • Gimlet decouples agentic AI workloads via intelligent orchestrator and hardware-agnostic compiler.
  • Autonomous kernel generation optimizes kernels for NVIDIA, AMD, Intel, and Cerebras without code changes.
  • Multi-silicon inference cloud splits models across CPUs, GPUs, and accelerators for 3-10x efficiency.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Flexible Work Hours

Growth & Insights and Company News

Headcount

6 month growth

-3%

1 year growth

7%

2 year growth

0%
SiliconANGLE Media
Mar 24th, 2026
Multichip inference cloud startup Gimlet Labs receives $80M to solve one of AI's biggest bottlenecks.

Multichip inference cloud startup Gimlet Labs receives $80M to solve one of AI's biggest bottlenecks. Gimlet Labs Inc. said today it has raised $80 million in early-stage funding to solve a bottleneck that's holding back artificial intelligence inference. The startup, which has raised $92 million in total, has created what's said to be the world's first and only "multi-silicon inference cloud." It differs from standard inference clouds, because it enables AI workloads to be run simultaneously across various kinds of chips. For instance, an AI application's work can be split across traditional central processing units, high-performance graphics processing units, and other kinds of processors. Inference is the process of using a trained machine learning model to make predictions or decisions on new, unseen data, turning AI into action. Menlo Ventures partner Tim Tully discussed in a blog post why multi-silicon inference is so useful. He explained that when an autonomous AI agent is assigned a task, it may "chain together dozens of model calls, retrieval steps and tool invocations across non-linear branching logic." Each step in this chain is best performed by different hardware. For instance, prefill is compute-bound, decode is memory-bound and tool calls are network-bound. "No single chip can handle all three efficiently. Instead, the answer is heterogeneous," Tully said. Compute-intensive batch inference is best done using GPUs, while latency-sensitive workloads can benefit from running on specialized static random-access memory-heavy processors such as Groq, Cerebras and d-Matrix, as these deliver exceptional speed. Tasks such as orchestration and tool use, on the other hand, generally run better on CPUs. "The multi-silicon fleet is ready - it's just missing the software layer to make it work," Tully said. By splitting up AI tasks across multiple processors in this way, Gimlet Labs says, it can dramatically improve efficiency and reduce the time chips spend sitting idle, waiting for instructions. It reckons that it can speed up inference workloads by anywhere from three to 10 times for the same cost and power. It can even slice up AI models themselves, so that different parts of them run on different chips. Gimlet Labs founder and Chief Executive Zain Asgar told TechCrunch in an interview that existing hardware generally runs only at between 15% and 30% efficiency. "You're wasting hundreds of billions of dollars because you're just leaving idle resources," he said. "Our goal was basically to try to figure out how you can get AI workloads to be 10 times more efficient than ever, today." The startup's software is not aimed at regular rank-and-file developers. Instead, Gimlet Labs is going after the big boys who run the largest AI model labs and the most expansive data centers. Its partners include some of the biggest chipmakers, including Nvidia Corp., Advanced Micro Devices Inc., Intel Corp., Arm Holdings Plc and Cerebras Systems Inc. Asgar told TechCrunch that the company is already generating revenue of eight figures despite only launching its platform in October. In the last four months it has doubled its customer base, with clients including a major model maker and an "extremely large" cloud computing company, he said. The Series A round was led by Menlo Ventures and saw participation from Factory, which led the company's seed, as well as Eclipse, Prosperity7 and Triamtomic. Today's funding round is all about giving Gimlet Labs the resources it needs to scale and ensure high-speed, efficient multichip inference becomes the norm. With that in mind, the startup is planning to expand its team and grow its inference cloud to meet the rapidly growing demand for faster inference. Photo: Gimlet Labs. A message from John Furrier, co-founder of SiliconANGLE: Support its mission to keep content open and free by engaging with theCUBE community. Join theCUBE's Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities. * 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more * 11.4k+ theCUBE alumni - Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network. About SiliconANGLE Media SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios - with flagship locations in Silicon Valley and the New York Stock Exchange - SiliconANGLE Media operates at the intersection of media, technology and AI. Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Its new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

TechCrunch
Mar 23rd, 2026
Gimlet Labs raises $80M Series A for multi-silicon AI inference cloud software

Gimlet Labs, a startup founded by Stanford adjunct professor Zain Asgar, has raised $80 million in a Series A round led by Menlo Ventures. The company has created what it claims is the first "multi-silicon inference cloud", software that allows AI workloads to run simultaneously across diverse hardware types including CPUs, GPUs and high-memory systems. The technology addresses the AI inference bottleneck by splitting workloads across whatever hardware is available, potentially improving efficiency by 3x to 10x. Asgar says existing data centre hardware is only utilised 15-30% of the time, wasting hundreds of billions of dollars. Launched in October with eight-figure revenues, Gimlet has partnered with chip makers including NVIDIA, AMD, Intel and Cerebras. The company has now raised $92 million total and employs 30 people.

Gimlet Labs
Mar 23rd, 2026
Announcing Gimlet's Series A Raise

Today, we're announcing our $80M Series A raise, led by Menlo Ventures and joined by Eclipse, Factory, Prosperity7, and Triatomic.

PR Newswire
Mar 12th, 2026
d-Matrix and Gimlet Labs deliver 10x speed boost and power efficiency for agentic AI inference

d-Matrix and Gimlet Labs have announced a partnership to deliver 10x performance improvements for agentic AI inference workloads. Gimlet Cloud will deploy d-Matrix Corsair accelerators alongside GPUs, achieving significant gains in latency and throughput per watt compared to GPU-only approaches. The solution divides workloads between GPUs and d-Matrix accelerators, with Corsair's memory-optimised architecture handling memory-bound portions of AI models. This is particularly effective for latency-sensitive tasks like speculative decoding used in large-scale AI deployments. Gimlet's software intelligently maps workloads across multiple accelerator types and vendors. The combined solution will be available to select customers through Gimlet Cloud in the second half of 2026. Both companies emphasise power efficiency as crucial for advancing AI infrastructure amid growing energy constraints.

SiliconANGLE Media
Oct 22nd, 2025
Gimlet Labs raises $12M for AI portability

Gimlet Labs Inc. launched with $12M in funding from investors including Factory, Intel CEO Lip-Bu Tan, and others. The startup offers a platform that allows AI models to be ported across different chips without coding, optimizing efficiency and reducing costs. It disaggregates AI workloads into components, deploying each on the most suitable chip. The platform uses a custom compiler for chip-specific optimizations and is generating “8-figure revenues” from Fortune 500 companies and AI providers.