Full-Time

Platform Validation Engineer – Tech Lead/Principal

Customer Platforms

Posted on 10/31/2025

d-Matrix

d-Matrix

201-500 employees

Delivers memory-integrated AI compute platforms

Compensation Overview

$175k - $260k/yr

Santa Clara, CA, USA

Hybrid

Three days on-site per week required.

Category
QA & Testing (1)
Software Engineering (1)
Requirements
  • Bachelor’s or Master’s degree in Electrical Engineering, Computer Science, or a related field.
  • 5+ years of experience in GPU server platform validation, preferably with PCIe-based hardware.
Responsibilities
  • Develop and execute system-level test plans for platform validation, including stress, thermal, and PCIe interoperability tests.
  • Automate test frameworks and validation workflows to improve test coverage and efficiency.
  • Drive root cause analysis and debug of failures in collaboration with hardware, firmware, and software teams.
  • Ensure platforms meet internal and external quality criteria for production readiness.
  • Document test procedures, results, and validation status across SKUs.
Desired Qualifications
  • Strong understanding of server architecture, Linux environments, and hardware-software interactions.
  • Experience with test automation (e.g., Python, Bash) and validation tools.
  • Detail-oriented with strong debugging and documentation skills.
  • Experience working with ODM and OEM vendors for GPU servers and rack scale solutions
  • Hands-on experience with Electrical Validation, Functional Validation and High Speed Bus Validation
  • Hands-on experience updating firmware on servers and utilizing custom vendor software tools for debug
  • Stress testing experience
  • Strong debugging skills across hardware (compute and networking) and host/embedded software

d-Matrix provides scalable, modular AI compute hardware and software for large datacenters, prioritizing energy efficiency and reduced data movement. Its core DIMC engine embeds compute directly into programmable memory, while a fabric of low-power chiplets delivers configurable compute resources and the accompanying software optimizes performance. This combination cuts data transfers and power use, aligning hardware design with memory-based computation for AI inference. The goal is to let large datacenters run AI workloads more efficiently at scale with customizable, modular compute platforms.

Company Size

201-500

Company Stage

Series C

Total Funding

$429M

Headquarters

Santa Clara, California

Founded

2019

Simplify Jobs

Simplify's Take

What believers are saying

  • Gimlet Labs partnership validates 10x latency and throughput-per-watt gains for agentic AI inference, establishing marquee reference customer for 2H 2026 launch.
  • GigaIO acquisition adds rack-scale systems expertise and high-performance interconnect technologies, accelerating SquadRack deployment and enterprise customer expansion.
  • Series C $275M funding at $2B valuation from Microsoft, Temasek, and Qatar Investment Authority funds Raptor commercialization and international scaling through 2026.

What critics are saying

  • Nvidia controls 80%+ AI accelerator market share with entrenched customer relationships; d-Matrix risks commoditization if performance gains fail to justify switching costs.
  • Raptor 3DIMC and 4nm production delays or yield failures in 2026 could force customers to default to proven Nvidia alternatives, undermining competitive positioning.
  • Hyperscaler vertical integration (Microsoft Maia, Google TPUs, Meta custom silicon) threatens d-Matrix's addressable market if cloud providers internalize inference acceleration.

What makes d-Matrix unique

  • Digital in-memory compute architecture eliminates data movement between processing and memory, reducing latency and power consumption versus GPU-based inference.
  • Chiplet-based design with scalable SRAM and 3D stacked DRAM (3DIMC) enables models up to 100B parameters on single rack with 10x better bandwidth than HBM4.
  • Purpose-built inference accelerator (Corsair) with standards-based PCIe deployment integrates seamlessly into existing data center infrastructure without custom engineering.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Hybrid Work Options

Growth & Insights and Company News

Headcount

6 month growth

-1%

1 year growth

0%

2 year growth

-9%
PR Newswire
Apr 2nd, 2026
d-Matrix acquires GigaIO data center business to strengthen rack-scale AI inference infrastructure

d-Matrix, a specialist in low-latency AI inference compute, has acquired GigaIO's data centre business, including its SuperNODE and FabreX PCIe-based memory fabric technologies. The deal builds on a collaboration that began in 2025. The acquisition adds a systems engineering team with expertise in rack-scale infrastructure and high-performance interconnects. d-Matrix will integrate GigaIO's technologies into its AI inference platform, which includes Corsair inference accelerators, JetStream networking and Aviator software. GigaIO will continue operating independently, focusing on edge computing. The acquisition establishes d-Matrix's new engineering presence in Carlsbad, California, expanding its global footprint to six locations across North America, Europe and Asia. Financial terms were not disclosed.

StorageNewsletter
Mar 17th, 2026
Nvidia GTC 2026: d-Matrix and Gimlet Labs to deliver 10x speed ups, massive power efficiency for frontier AI workloads.

Nvidia GTC 2026: d-Matrix and Gimlet Labs to deliver 10x speed ups, massive power efficiency for frontier AI workloads. d-Matrix and Gimlet's combined solution can deliver order-of-magnitude performance increases on both inference latency and throughput per Watt vs. GPU-only stacks. * Gimlet Cloud, built for running agentic AI inference, to deploy d-Matrix Corsair low latency, memory-optimized accelerators alongside GPUs * 10x performance benefits in latency and throughput per Watt compared to GPU-only approach * Job-division between GPUs and d-Matrix accelerators enables faster interactivity, massive power savings d-Matrix, a player in low latency AI inference compute for data centers, and Gimlet Labs, an applied AI research and product company, announced that Gimlet is incorporating d-Matrix Corsair accelerators into the Gimlet Cloud alongside traditional GPUs to deliver 10x speed ups for agentic AI inference workloads.d-Matrix and Gimlet's combined solution can deliver order-of-magnitude performance increases on both inference latency and throughput per Watt compared to traditional GPU-only deployments. The solution is ideal for latency-sensitive workloads including speculative decoding, which is commonly adopted by large-scale AI deployments to reduce latency. With d-Matrix Corsair accelerators on Gimlet's Cloud, workloads already well-optimized for agentic AI can achieve even greater performance gains, enabling token delivery speeds that enable industry-leading levels of interactivity required for today's most critical applications. "Model providers are spending billions on inference, and the demand for fast tokens is higher than ever - but power remains a scarce resource," said Zain Asgar, founder and CEO, Gimlet Labs. "d-Matrix hardware is the ideal solution for the phases of inference that GPUs waste energy on. By leveraging Corsair for use cases like speculative decoding, we can deliver dramatically faster performance for our customers for the same footprint." "From day one, d-Matrix has been uniquely focused on inference, founded on our belief that inference would not be a one-size-fits-all compute problem. As the only multi-silicon inference cloud, Gimlet is leading the industry with a fundamental new approach that delivers dramatic leaps forward in performance that homogeneous infrastructure simply cannot deliver," said Sid Sheth, founder and CEO, d-Matrix. "With power limits capping how fast AI can advance, it's imperative that AI service providers have the right tools for the right job and that we embrace doing more with less." Gimlet's software stack is the first to intelligently divide and map agentic workloads across a variety of accelerators spanning multiple vendors, gens and architectures and runs each segment on the most optimal hardware. Gimlet's datacenters incorporate these different hardware types and connect them via high-speed interconnects to serve frontier labs and other AI native companies. d-Matrix Corsair's unique memory-optimized architecture delivers high memory bandwidth and low latency, making it ideal for running memory-bound portions of the AI model. Corsair ships as a standard PCIe card with air cooling, which enables rapid deployments in existing data centers. The companies plan to make their combined solution available to select customers through Gimlet Cloud in 2H 2026. Read also: d-Matrix 3DIMC to deliver 10x faster inference than HBM4-based solutions, commercial debut planned with d-Matrix Raptor inference accelerator Collaboration combines d-Matrix 3DIMC technology with Andes' high-performance RISC-V CPU IP for Raptor, d-Matrix's next-gen accelerator for blazing fast, sustainable AI Series C led by global consortium values company at $2 billion, accelerates product and customer expansion as demand grows for faster, more efficient data center inference Arista, Broadcom and Supermicro team with d-Matrix to offer disaggregated standards-based approach for ultra-low latency batched inference Delivering gains in performance, cost and energy

d-Matrix
Mar 16th, 2026
Going vertical: why we created a 3D DRAM solution to advance low latency AI inference.

Going vertical: why d-Matrix inc. created a 3D DRAM solution to advance low latency AI inference. d-Matrix inc. scaled SRAM to create a system to run even larger models with the extremely low latency benefits it brings on single chips. The next step is to rethink DRAM altogether. Published: March 16, 2026 By: d-Matrix Team When d-Matrix inc. launched seven years ago, d-Matrix inc. had one goal: to build the fastest and most scalable technology to power small-batch AI inference and interactive applications. Both of those have become absolute table stakes in the last 12 months as user expectations grow and tens of millions of people flock to interactive applications. Its approach involved deploying purpose-built SRAM-based inference architecture at scale to capture the steps in inference that needed to be completed fastest, were relatively low complexity, and captured a significant volume of the actual inference compute. But to support the full scope of AI inference, including future innovations, d-Matrix inc. knew from the beginning that d-Matrix inc. would have to extend the same performance and low-latency Corsair has, but with larger memory capacity. To do that, d-Matrix inc. went vertical: adding an additional layer of DRAM on top of the compute. Agentic pipelines are becoming increasingly sophisticated, and some steps will inevitably require larger models for quality purposes - such as translation or code completion. The same performance d-Matrix inc. bring to smaller models must inevitably extend to models at significantly larger scale, as well as even further optimize disaggregated inference pipelines. Why memory was the blocker here - and will remain in the future Its chiplet-based design with on-chip SRAM and PCIe-based architecture enables d-Matrix inc. to scale up the total memory pool available in SRAM linearly, with an additional pool of DRAM available when needed. This enabled several benefits: * Ultra-low latency, particularly for task-specific steps in agentic pipelines where interactivity is the determining factor of success and single agentic steps can hold up the entire pipeline. * Seamless scalability that enables d-Matrix inc. to grow to a rack-level pool of memory that can operate most models on rack-scale Corsair. * Plug-and-play hardware that fits directly into most existing data center configurations with a low power envelope. * Highly flexible in disaggregated pipelines that optimizes full pipelines by working in concert with other accelerators like GPUs to accelerate larger, more powerful models. Smaller models, however, are only part of the solution - nor are they the only area where rapid innovation is happening. Frontier models as well as recent open-weight models like Qwen, Kimi and DeepSeek have delivered powerful reasoning capability for complex tasks but are sprawled into the hundreds of billions of parameters. Scaling SRAM beyond a single die It's tempting to look at a chiplet design and say you've just split it into a bunch of tiny HBM-esque pipes rather than a single pathway. But the problem itself has shifted to a different realm governed by die-to-die interconnectivity. SRAM access is still adjacent to compute on-die and operating at full speed. Scaling that up moves the challenge to a die-to-die architectural problem - when one chiplet needs data from a neighbor. That shifts the problem space to a different set of metrics: bandwidth per millimeter of edge, latency per hop, and energy per bit transferred. Optimizing each of them makes a chiplet-based architecture behave as if it has one giant pool of ultra-fast memory rather than discrete pockets. This gives d-Matrix inc. a way to scale up the available pool of SRAM memory while preserving low latency and performance requirements. But that scales elegantly up to a certain point that falls short of handling the largest reasoning models and the general shift toward significant token consumption. Rack-scale SRAM with Corsair captures a significant surface area of the space for AI workloads, and disaggregated pipelines with Corsair capture an even larger space. In fact, data released with its partner Gimlet Labs shows that there is as much as a 10x performance boost when deploying Corsair in a heterogeneous pipeline for small-batch inference. Shifting to 3D stacked DRAM Modern reasoning models aren't just larger by virtue of the number of parameters - they also consume substantially more tokens. Even at a smaller scale, reasoning models can consume anywhere significantly more tokens to achieve a result. The total memory footprint is growing on two axes for interactive applications requiring reasoning models. A stacked 3D DRAM configuration still lives in the die-to-die interconnectivity space, which allows d-Matrix inc. to target 10x better memory bandwidth and 10x better energy efficiency using 3DIMC over HBM4 configurations. In addition, its chiplet architecture allows for easier 3D stacking of DRAM, just as it initially allowed for easier scaling of SRAM memory pools. This addresses both capacity and bandwidth limitations constrained by SRAM scaling. d-Matrix inc. took passive DRAM d-Matrix inc. use for capacitors and converted that to an active stack, By doing that d-Matrix inc. can expose every small bank in the DRAM and directly bring it to the compute engine. With 3D DRAM, d-Matrix inc. now have the entire 3D surface area to connect, and the signals can be run at the DRAM base which is a few hundred Mhz. By doing that d-Matrix inc. can get 20 TB/s bandwidth per stack - 10x what HBM4 can achieve - at a power consumption of .3-.4 pJ/bit bit, compared to 3-4 pJ/bit. 3DIMC: Industry's first 3D DRAM solution for AI inference d-Matrix inc. announced 3DIMC, its stacked DRAM solution, at Hot Chips in August 2025. Since then, d-Matrix inc. has accomplished what d-Matrix inc. hoped - d-Matrix inc. has proven it works. d-Matrix inc. successfully validated Pavehawk, its test chip for 3D DRAM operates within its performance and power targets, demonstrating that the theory was not only sound, but the attainable path forward to powering next-generation AI inference. Its stacked DRAM chip, Pavehawk, arrived in its labs in August 2025 and d-Matrix inc. got to work to test the aggressive targets d-Matrix inc. set for ourselves. d-Matrix inc. now have had the opportunity to stress-test the very first iterations of the Pavehawk chips across different voltages and temperature ranges. Thus far d-Matrix inc. is seeing around 0.4 pJ/bit for the worst case scenarios, and that will decrease further as d-Matrix inc. complete additional optimizations. The future of AI inference is 3D stacked memory The answer obviously doesn't just lie with throwing an extra layer of DRAM on top of an existing one. Verticality exposes a whole new operating space to grow memory pools and meet the ravenous demand for low-latency, high performance interactive apps. Corsair was the world's first accelerator that offered a whopping 2GB of available SRAM per card, with the ability to scale up to 128 GB in a rack. A single server is capable of hosting and running a Llama 3.1 8B model that can handle specific tasks in agent pipelines, and it gracefully scales to larger models in a rack. Pavehawk is its first crack at the next problem, which will be central to its second-generation accelerator, Raptor. More sophisticated agentic pipelines will require increasingly sophisticated models, and even smaller models are becoming more robust and capable. Pavehawk not only enables larger models on its own - it dramatically improves disaggregated pipelines in a way far beyond what Corsair offers. If you're interested in trying out or purchasing Corsair, you can request early access or contact its sales directly. Its next task is meeting the incredible demand required by emerging AI workloads with high user expectations, and that starts with Pavehawk. Article tags: Suggested articles. By Sree Ganesan | October 14, 2025 By Aseem Bathla | July 17, 2025 By Matthew Lynley | August 14, 2025

PR Newswire
Mar 12th, 2026
d-Matrix and Gimlet Labs deliver 10x speed boost and power efficiency for agentic AI inference

d-Matrix and Gimlet Labs have announced a partnership to deliver 10x performance improvements for agentic AI inference workloads. Gimlet Cloud will deploy d-Matrix Corsair accelerators alongside GPUs, achieving significant gains in latency and throughput per watt compared to GPU-only approaches. The solution divides workloads between GPUs and d-Matrix accelerators, with Corsair's memory-optimised architecture handling memory-bound portions of AI models. This is particularly effective for latency-sensitive tasks like speculative decoding used in large-scale AI deployments. Gimlet's software intelligently maps workloads across multiple accelerator types and vendors. The combined solution will be available to select customers through Gimlet Cloud in the second half of 2026. Both companies emphasise power efficiency as crucial for advancing AI infrastructure amid growing energy constraints.

Central News Agency (CNA)
Nov 18th, 2025
Ex-Intel CEO says Taiwan energy concerns warrant U.S. pivot

Ex-Intel CEO says Taiwan energy concerns warrant U.S. pivot. Taipei, Nov. 18 (CNA) Visiting former Intel Corp. boss Pat Gelsinger said Tuesday that concerns over Taiwan's strained energy supply justify efforts to shift more semiconductor production to the United States, despite the island's manufacturing strengths. At a news conference in Taipei, Gelsinger said Taiwan was "not in the position to have a resilient energy supply chain," a weakness he warned puts the island's chip industry "in a very precarious state." Gelsinger said that strengthening supply-chain resilience, including Taiwan Semiconductor Manufacturing Co.'s (TSMC) investment in the United States, will benefit the global semiconductor ecosystem. "More of the growth should occur in other geographies," Gelsinger said. "I encourage them to have more advanced nodes and R&D in the U.S." Despite these challenges, Gelsinger said that Taiwan's manufacturing advantages mean it should not be discouraged by potential U.S. tariffs. "There's no place like Taiwan, [where] you can have an idea at breakfast, you can have a prototype by lunch, and you can have manufacturing by dinner." Gelsinger spoke at an event announcing partnerships between his current employer, the California-based venture capital firm Playground Global, and seven companies. Among the seven, Ayar Labs unveiled a strategic partnership with Taiwan's application-specific integrated circuit (ASIC) provider Global Unichip Corp. (GUC) to integrate co-packaged optics (CPO) into GUC's advanced ASIC design services. Meanwhile, d-Matrix announced collaborations with TSMC, Alchip Technologies and packaging-and-testing giant ASE to jointly develop 3D memory-stacking solutions to accelerate AI development.

INACTIVE