Full-Time
Delivers memory-integrated AI compute platforms
$155k - $258k/yr
Santa Clara, CA, USA
Hybrid
Three days on-site per week.
| , |
d-Matrix provides scalable, modular AI compute hardware and software for large datacenters, prioritizing energy efficiency and reduced data movement. Its core DIMC engine embeds compute directly into programmable memory, while a fabric of low-power chiplets delivers configurable compute resources and the accompanying software optimizes performance. This combination cuts data transfers and power use, aligning hardware design with memory-based computation for AI inference. The goal is to let large datacenters run AI workloads more efficiently at scale with customizable, modular compute platforms.
Company Size
201-500
Company Stage
Series C
Total Funding
$429M
Headquarters
Santa Clara, California
Founded
2019
Help us improve and share your feedback! Did you find this helpful?
Hybrid Work Options
d-Matrix, a specialist in low-latency AI inference compute, has acquired GigaIO's data centre business, including its SuperNODE and FabreX PCIe-based memory fabric technologies. The deal builds on a collaboration that began in 2025. The acquisition adds a systems engineering team with expertise in rack-scale infrastructure and high-performance interconnects. d-Matrix will integrate GigaIO's technologies into its AI inference platform, which includes Corsair inference accelerators, JetStream networking and Aviator software. GigaIO will continue operating independently, focusing on edge computing. The acquisition establishes d-Matrix's new engineering presence in Carlsbad, California, expanding its global footprint to six locations across North America, Europe and Asia. Financial terms were not disclosed.
Nvidia GTC 2026: d-Matrix and Gimlet Labs to deliver 10x speed ups, massive power efficiency for frontier AI workloads. d-Matrix and Gimlet's combined solution can deliver order-of-magnitude performance increases on both inference latency and throughput per Watt vs. GPU-only stacks. * Gimlet Cloud, built for running agentic AI inference, to deploy d-Matrix Corsair low latency, memory-optimized accelerators alongside GPUs * 10x performance benefits in latency and throughput per Watt compared to GPU-only approach * Job-division between GPUs and d-Matrix accelerators enables faster interactivity, massive power savings d-Matrix, a player in low latency AI inference compute for data centers, and Gimlet Labs, an applied AI research and product company, announced that Gimlet is incorporating d-Matrix Corsair accelerators into the Gimlet Cloud alongside traditional GPUs to deliver 10x speed ups for agentic AI inference workloads.d-Matrix and Gimlet's combined solution can deliver order-of-magnitude performance increases on both inference latency and throughput per Watt compared to traditional GPU-only deployments. The solution is ideal for latency-sensitive workloads including speculative decoding, which is commonly adopted by large-scale AI deployments to reduce latency. With d-Matrix Corsair accelerators on Gimlet's Cloud, workloads already well-optimized for agentic AI can achieve even greater performance gains, enabling token delivery speeds that enable industry-leading levels of interactivity required for today's most critical applications. "Model providers are spending billions on inference, and the demand for fast tokens is higher than ever - but power remains a scarce resource," said Zain Asgar, founder and CEO, Gimlet Labs. "d-Matrix hardware is the ideal solution for the phases of inference that GPUs waste energy on. By leveraging Corsair for use cases like speculative decoding, we can deliver dramatically faster performance for our customers for the same footprint." "From day one, d-Matrix has been uniquely focused on inference, founded on our belief that inference would not be a one-size-fits-all compute problem. As the only multi-silicon inference cloud, Gimlet is leading the industry with a fundamental new approach that delivers dramatic leaps forward in performance that homogeneous infrastructure simply cannot deliver," said Sid Sheth, founder and CEO, d-Matrix. "With power limits capping how fast AI can advance, it's imperative that AI service providers have the right tools for the right job and that we embrace doing more with less." Gimlet's software stack is the first to intelligently divide and map agentic workloads across a variety of accelerators spanning multiple vendors, gens and architectures and runs each segment on the most optimal hardware. Gimlet's datacenters incorporate these different hardware types and connect them via high-speed interconnects to serve frontier labs and other AI native companies. d-Matrix Corsair's unique memory-optimized architecture delivers high memory bandwidth and low latency, making it ideal for running memory-bound portions of the AI model. Corsair ships as a standard PCIe card with air cooling, which enables rapid deployments in existing data centers. The companies plan to make their combined solution available to select customers through Gimlet Cloud in 2H 2026. Read also: d-Matrix 3DIMC to deliver 10x faster inference than HBM4-based solutions, commercial debut planned with d-Matrix Raptor inference accelerator Collaboration combines d-Matrix 3DIMC technology with Andes' high-performance RISC-V CPU IP for Raptor, d-Matrix's next-gen accelerator for blazing fast, sustainable AI Series C led by global consortium values company at $2 billion, accelerates product and customer expansion as demand grows for faster, more efficient data center inference Arista, Broadcom and Supermicro team with d-Matrix to offer disaggregated standards-based approach for ultra-low latency batched inference Delivering gains in performance, cost and energy
Going vertical: why d-Matrix inc. created a 3D DRAM solution to advance low latency AI inference. d-Matrix inc. scaled SRAM to create a system to run even larger models with the extremely low latency benefits it brings on single chips. The next step is to rethink DRAM altogether. Published: March 16, 2026 By: d-Matrix Team When d-Matrix inc. launched seven years ago, d-Matrix inc. had one goal: to build the fastest and most scalable technology to power small-batch AI inference and interactive applications. Both of those have become absolute table stakes in the last 12 months as user expectations grow and tens of millions of people flock to interactive applications. Its approach involved deploying purpose-built SRAM-based inference architecture at scale to capture the steps in inference that needed to be completed fastest, were relatively low complexity, and captured a significant volume of the actual inference compute. But to support the full scope of AI inference, including future innovations, d-Matrix inc. knew from the beginning that d-Matrix inc. would have to extend the same performance and low-latency Corsair has, but with larger memory capacity. To do that, d-Matrix inc. went vertical: adding an additional layer of DRAM on top of the compute. Agentic pipelines are becoming increasingly sophisticated, and some steps will inevitably require larger models for quality purposes - such as translation or code completion. The same performance d-Matrix inc. bring to smaller models must inevitably extend to models at significantly larger scale, as well as even further optimize disaggregated inference pipelines. Why memory was the blocker here - and will remain in the future Its chiplet-based design with on-chip SRAM and PCIe-based architecture enables d-Matrix inc. to scale up the total memory pool available in SRAM linearly, with an additional pool of DRAM available when needed. This enabled several benefits: * Ultra-low latency, particularly for task-specific steps in agentic pipelines where interactivity is the determining factor of success and single agentic steps can hold up the entire pipeline. * Seamless scalability that enables d-Matrix inc. to grow to a rack-level pool of memory that can operate most models on rack-scale Corsair. * Plug-and-play hardware that fits directly into most existing data center configurations with a low power envelope. * Highly flexible in disaggregated pipelines that optimizes full pipelines by working in concert with other accelerators like GPUs to accelerate larger, more powerful models. Smaller models, however, are only part of the solution - nor are they the only area where rapid innovation is happening. Frontier models as well as recent open-weight models like Qwen, Kimi and DeepSeek have delivered powerful reasoning capability for complex tasks but are sprawled into the hundreds of billions of parameters. Scaling SRAM beyond a single die It's tempting to look at a chiplet design and say you've just split it into a bunch of tiny HBM-esque pipes rather than a single pathway. But the problem itself has shifted to a different realm governed by die-to-die interconnectivity. SRAM access is still adjacent to compute on-die and operating at full speed. Scaling that up moves the challenge to a die-to-die architectural problem - when one chiplet needs data from a neighbor. That shifts the problem space to a different set of metrics: bandwidth per millimeter of edge, latency per hop, and energy per bit transferred. Optimizing each of them makes a chiplet-based architecture behave as if it has one giant pool of ultra-fast memory rather than discrete pockets. This gives d-Matrix inc. a way to scale up the available pool of SRAM memory while preserving low latency and performance requirements. But that scales elegantly up to a certain point that falls short of handling the largest reasoning models and the general shift toward significant token consumption. Rack-scale SRAM with Corsair captures a significant surface area of the space for AI workloads, and disaggregated pipelines with Corsair capture an even larger space. In fact, data released with its partner Gimlet Labs shows that there is as much as a 10x performance boost when deploying Corsair in a heterogeneous pipeline for small-batch inference. Shifting to 3D stacked DRAM Modern reasoning models aren't just larger by virtue of the number of parameters - they also consume substantially more tokens. Even at a smaller scale, reasoning models can consume anywhere significantly more tokens to achieve a result. The total memory footprint is growing on two axes for interactive applications requiring reasoning models. A stacked 3D DRAM configuration still lives in the die-to-die interconnectivity space, which allows d-Matrix inc. to target 10x better memory bandwidth and 10x better energy efficiency using 3DIMC over HBM4 configurations. In addition, its chiplet architecture allows for easier 3D stacking of DRAM, just as it initially allowed for easier scaling of SRAM memory pools. This addresses both capacity and bandwidth limitations constrained by SRAM scaling. d-Matrix inc. took passive DRAM d-Matrix inc. use for capacitors and converted that to an active stack, By doing that d-Matrix inc. can expose every small bank in the DRAM and directly bring it to the compute engine. With 3D DRAM, d-Matrix inc. now have the entire 3D surface area to connect, and the signals can be run at the DRAM base which is a few hundred Mhz. By doing that d-Matrix inc. can get 20 TB/s bandwidth per stack - 10x what HBM4 can achieve - at a power consumption of .3-.4 pJ/bit bit, compared to 3-4 pJ/bit. 3DIMC: Industry's first 3D DRAM solution for AI inference d-Matrix inc. announced 3DIMC, its stacked DRAM solution, at Hot Chips in August 2025. Since then, d-Matrix inc. has accomplished what d-Matrix inc. hoped - d-Matrix inc. has proven it works. d-Matrix inc. successfully validated Pavehawk, its test chip for 3D DRAM operates within its performance and power targets, demonstrating that the theory was not only sound, but the attainable path forward to powering next-generation AI inference. Its stacked DRAM chip, Pavehawk, arrived in its labs in August 2025 and d-Matrix inc. got to work to test the aggressive targets d-Matrix inc. set for ourselves. d-Matrix inc. now have had the opportunity to stress-test the very first iterations of the Pavehawk chips across different voltages and temperature ranges. Thus far d-Matrix inc. is seeing around 0.4 pJ/bit for the worst case scenarios, and that will decrease further as d-Matrix inc. complete additional optimizations. The future of AI inference is 3D stacked memory The answer obviously doesn't just lie with throwing an extra layer of DRAM on top of an existing one. Verticality exposes a whole new operating space to grow memory pools and meet the ravenous demand for low-latency, high performance interactive apps. Corsair was the world's first accelerator that offered a whopping 2GB of available SRAM per card, with the ability to scale up to 128 GB in a rack. A single server is capable of hosting and running a Llama 3.1 8B model that can handle specific tasks in agent pipelines, and it gracefully scales to larger models in a rack. Pavehawk is its first crack at the next problem, which will be central to its second-generation accelerator, Raptor. More sophisticated agentic pipelines will require increasingly sophisticated models, and even smaller models are becoming more robust and capable. Pavehawk not only enables larger models on its own - it dramatically improves disaggregated pipelines in a way far beyond what Corsair offers. If you're interested in trying out or purchasing Corsair, you can request early access or contact its sales directly. Its next task is meeting the incredible demand required by emerging AI workloads with high user expectations, and that starts with Pavehawk. Article tags: Suggested articles. By Sree Ganesan | October 14, 2025 By Aseem Bathla | July 17, 2025 By Matthew Lynley | August 14, 2025
d-Matrix and Gimlet Labs have announced a partnership to deliver 10x performance improvements for agentic AI inference workloads. Gimlet Cloud will deploy d-Matrix Corsair accelerators alongside GPUs, achieving significant gains in latency and throughput per watt compared to GPU-only approaches. The solution divides workloads between GPUs and d-Matrix accelerators, with Corsair's memory-optimised architecture handling memory-bound portions of AI models. This is particularly effective for latency-sensitive tasks like speculative decoding used in large-scale AI deployments. Gimlet's software intelligently maps workloads across multiple accelerator types and vendors. The combined solution will be available to select customers through Gimlet Cloud in the second half of 2026. Both companies emphasise power efficiency as crucial for advancing AI infrastructure amid growing energy constraints.
Ex-Intel CEO says Taiwan energy concerns warrant U.S. pivot. Taipei, Nov. 18 (CNA) Visiting former Intel Corp. boss Pat Gelsinger said Tuesday that concerns over Taiwan's strained energy supply justify efforts to shift more semiconductor production to the United States, despite the island's manufacturing strengths. At a news conference in Taipei, Gelsinger said Taiwan was "not in the position to have a resilient energy supply chain," a weakness he warned puts the island's chip industry "in a very precarious state." Gelsinger said that strengthening supply-chain resilience, including Taiwan Semiconductor Manufacturing Co.'s (TSMC) investment in the United States, will benefit the global semiconductor ecosystem. "More of the growth should occur in other geographies," Gelsinger said. "I encourage them to have more advanced nodes and R&D in the U.S." Despite these challenges, Gelsinger said that Taiwan's manufacturing advantages mean it should not be discouraged by potential U.S. tariffs. "There's no place like Taiwan, [where] you can have an idea at breakfast, you can have a prototype by lunch, and you can have manufacturing by dinner." Gelsinger spoke at an event announcing partnerships between his current employer, the California-based venture capital firm Playground Global, and seven companies. Among the seven, Ayar Labs unveiled a strategic partnership with Taiwan's application-specific integrated circuit (ASIC) provider Global Unichip Corp. (GUC) to integrate co-packaged optics (CPO) into GUC's advanced ASIC design services. Meanwhile, d-Matrix announced collaborations with TSMC, Alchip Technologies and packaging-and-testing giant ASE to jointly develop 3D memory-stacking solutions to accelerate AI development.