Full-Time

Technical Sourcing Manager

Posted on 10/31/2025

Lambda

Lambda

501-1,000 employees

Cloud-based GPU services for AI training

Compensation Overview

$188k - $282k/yr

+ Equity

San Jose, CA, USA

Hybrid

Four days on-site per week in San Jose; designated work-from-home day is Tuesday.

Category
People & HR (1)
Required Skills
NetSuite
Computer Networking
SAP Products
Oracle
Requirements
  • Bachelor’s degree in Supply Chain, Engineering, Business, or related field
  • 10+ years of supply chain experience in data center, cloud, compute, networking, and/or AI hardware
  • Proven experience working with server, networking, storage, and/or data center technology vendors and OEMs
  • Strong technical literacy — comfortable collaborating with engineers on component specifications, product architectures, and build materials (BOMs)
  • Exceptional negotiation and relationship management skills; capable of influencing internal and external stakeholders
  • Have direct experience managing OEMs, ODMs, and JDM suppliers and communicating with senior leadership
  • Deep understanding of cost structures, value chain analysis, and total cost of ownership (TCO) modeling
  • Experience in structuring, negotiating, and executing contracts (MSAs, MPAs, and SOWs) and complex agreements
  • Proficiency with supply chain management tools (e.g., ERP systems like NetSuite, SAP, Oracle)
  • Demonstrate sound interpersonal and communication skills, including presenting strategic decisions to the executive team
Responsibilities
  • Partner in leading the sourcing strategy for AI infrastructure components, including vendor sourcing for new products & technologies
  • Seek out, establish, new vendors for key infrastructure components, technically vetting new vendor’s capabilities
  • Develop category sourcing strategies to ensure suppliers are capable of meeting Lambda’s current and future technical and business requirements
  • Build deep supplier engagement programs to secure critical allocations during constrained market conditions
  • Establish the supplier selection process, evaluate alternate suppliers, drive the RFI/RFQ/RFP process, to improve the TCO and quality
  • Negotiate with suppliers and partners to establish best in class pricing, quality, delivery, payment, and service terms. Develop benchmarks and should cost models
  • Build an end to end OEM/ODM strategy to support Lambda’s growing cloud hardware scale
  • Establish and manage strategic supplier and commercial relationships with our key partners at the executive level. Influence supplier’s business models, roadmaps, and support models
  • Partner closely with Engineering, Data Center Operations, and Finance teams to align supplier capabilities with technical roadmaps and scaling needs.
  • Manage contractual agreements including master supply agreements, statements of work, and service level agreements.
  • Develop a supplier management framework, including QBRs, scorecards, and supplier performance (SLAs)
  • Monitor market dynamics (pricing, availability, emerging technologies) and advise on sourcing risk management strategies.
Desired Qualifications
  • Experience sourcing hardware for AI/ML infrastructure deployments (e.g., H100, B200 clusters)
  • Familiarity with sourcing and integration of liquid cooling infrastructure for high-density AI data centers
  • Prior work with tier-1 hyperscale cloud providers or AI infrastructure startups
  • Master’s degree (MBA, MS Supply Chain, or Engineering Management)
  • Experience with global sourcing strategies including Asia-based ODMs (e.g., Wistron, Pegatron, Foxconn)

Lambda Labs provides cloud-based GPU services for AI training and inference. Its AI Developer Cloud runs on NVIDIA GH200 Grace Hopper hardware to train large language models and generative AI, offering on-demand and reserved GPUs billed by the hour (for example, $1.99/hour for H100). It differentiates itself through competitive pricing, high availability, and an integrated ML stack with Lambda Stack for easy installation of PyTorch, TensorFlow, CUDA, cuDNN, and NVIDIA drivers, plus Lambda Echelon for owning infrastructure with hosting and support. Its goal is to make scalable AI development and deployment affordable by providing flexible GPU access, reliable hosting, and streamlined software deployment for teams working with large models.

Company Size

501-1,000

Company Stage

Debt Financing

Total Funding

$4.1B

Headquarters

San Jose, California

Founded

2012

Simplify Jobs

Simplify's Take

What believers are saying

  • Inference dominance favors Lambda's CPU-GPU systems excelling in KV cache and long-context workloads.
  • NVIDIA partnerships grant Lambda priority H100, H200, and GB300 allocations during shortages.
  • $1.5B Series E and $1B J.P. Morgan credit fuel 3GW AI factories by 2030 under new CEO Combes.

What critics are saying

  • CoreWeave undercuts H100 prices 25-40%, capturing 60% of Lambda's AI training workloads now.
  • NVIDIA's Q3 2026 DGX Cloud bypasses Lambda, seizing 30% enterprise inference demand.
  • Microsoft terminates $2B colocation contracts by end-2026, slashing 40% of Lambda's revenue.

What makes Lambda unique

  • Lambda delivers bare-metal NVIDIA Vera Rubin NVL72 instances in H2 2026 without hypervisor overhead.
  • Lambda deploys NVIDIA Quantum-X Photonics in 10,000-GPU GB300 NVL72 clusters for superior bandwidth.
  • Lambda Stack pre-configures all hardware with ML frameworks for instant AI workload deployment.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

401(k) Retirement Plan

401(k) Company Match

Unlimited Paid Time Off

Wellness Program

Commuter Benefits

Growth & Insights and Company News

Headcount

6 month growth

3%

1 year growth

1%

2 year growth

5%
Lambda
Mar 31st, 2026
Lambda at NVIDIA GTC 2026: our thoughts.

Lambda at NVIDIA GTC 2026: its thoughts. The industry stopped asking, 'What's possible?' and started asking, 'Who can deliver?' NVIDIA GTC 2026 drew over 30,000 attendees, and the tone had shifted. Engineers and infrastructure teams weren't there to explore what's possible. They were there to gauge what works and who can deliver it. Open models, inference workloads, and system-level constraints came up repeatedly, from Jensen Huang's interviews to discussions on the floor. The focus was on how to run models reliably at scale. For Lambda, that's validation of work already underway. What the conference confirmed about its infrastructure bets Every announcement at NVIDIA GTC 2026 was significant in its own right, but what stood out even more was the consistency of the underlying themes. Three challenges kept surfacing across keynotes, conversations, and product announcements: how to scale compute beyond the GPU, how to move data fast enough to achieve higher GPU utilization, and how to build networks that hold up at rack and data center scale. Lambda was among the first AI-native clouds and an early NVIDIA Cloud Partner to announce bare-metal instances on NVIDIA Vera Rubin NVL72, with systems arriving in H2 2026. The unit of compute is no longer limited by GPUs alone; it's the data center. Unlocking the full performance of the data center requires direct hardware access with minimal abstraction and no virtualization overhead. As an early launch collaborator on the NVIDIA Vera CPU, Lambda Inc. built its architecture around a principle often overlooked: CPU performance matters. From orchestration of GPU infrastructure, to executing agentic tools and managing software environments, CPU performance impacts how quickly models learn and how responsively agents act. In networking, Lambda is among the first to deploy NVIDIA Quantum-X InfiniBand Photonics in production, on a 10,000-GPU NVIDIA GB300 NVL72 cluster. At this scale, networking is not just a bandwidth problem. It becomes a problem of power efficiency, reliability, and operability. Finally, its participation in the NVIDIA BlueField-4 STX ecosystem places Lambda Inc. among a select group focused on deploying and operating systems well adapted to the data challenges of real-world inference at scale. Taken together, these decisions reflect a focus on building systems that are production-ready for large-scale training and inference. Three structural shifts defining the next era of AI infrastructure Inference is now the primary workload. While clusters for training remain massive, inference is now the dominant workload. Agentic systems have shifted the bottleneck to memory bandwidth, KV cache management, and data movement. Raw FLOPs matter less than the efficiency with which data moves through the system. Lambda Inc. prioritize balanced CPU-GPU systems and memory architectures designed for continuous, long-context workloads, not just peak training throughput. The data center is the unit of scale. The days of evaluating a GPU in isolation are over. Teams now want to understand how compute, memory, networking, power, and cooling work together. The shift to rack-scale systems that expand to data centers reflects the need for coordinated performance across the entire stack. The market has moved to execution. The key question is no longer "what is possible?" but "what works, and who can deliver?" Throughout the conference, customers moved from exploration to vendor selection and capacity planning. Evaluation cycles are shrinking, and the need for proven ability to execute and deliver next-generation hardware is greater. Lambda Inc. is shipping NVIDIA Vera Rubin NVL72 as bare-metal instances in H2 2026, with the system architecture required to run them effectively. What engineers actually asked Lambda Inc. on the floor Beyond Jensen's keynote, some of the most valuable signals at NVIDIA GTC came from direct conversations. What's changed is where the requirements originate. ML and AI engineers are more prescriptive about the compute configuration required to meet their specific workloads. Co-engineering is now the critical path to success: workload expertise met by infrastructure expertise. Lambda Inc. were asked about data locality: specifically, which regions could support workloads without cross-region latency penalties. Then, about long-context inference strategies, KV cache pressure at 128K+ lengths, and memory architecture under sustained load. Lambda Inc. were also asked about capacity: what compute could be provisioned, and when. When Lambda Inc. demonstrated its real-time region availability maps, its 1-Click Cluster provisioning, and its transparent Model FLOPS Utilization results, Lambda Inc. moved conversations quickly from pitch to planning. Customers pushed for proof under real workloads, not ideal benchmarks. This shift is pushing the industry from narrative-driven claims to measurable performance. The race to execute has begun. Here is where Lambda Inc. is focused NVIDIA GTC 2026 confirmed its trajectory. What differentiates teams now is co-engineering and execution: delivering production-grade systems that meet precise specifications and work at scale. In H2 2026, Lambda Inc. begin deploying NVIDIA Vera Rubin NVL72 as bare-metal instances. Its focus is on building balanced CPU-GPU systems with predictable, high-scale networks, designed with memory architectures that improve utilization in production. This gives customers full control over hardware and software, with no abstraction layers. NVIDIA GTC has evolved from a developer-centric forum into a broader forum for how AI infrastructure gets built. The engineers Lambda Inc. spoke with at GTC weren't exploring. They were deciding. If you're in that stage too, Lambda Inc.'d like to help. Talk to its team.

Lambda
Mar 16th, 2026
Lambda at NVIDIA GTC 2026: building the Superintelligence Cloud.

Lambda at NVIDIA GTC 2026: building the Superintelligence Cloud. Lambda announces NVIDIA Vera CPUs, new Lambda Bare Metal Instances, NVIDIA Photonics, and NVIDIA STX coming to the Superintelligence Cloud. Today, Lambda is announcing the expansion of its AI factories to include NVIDIA Vera CPUs to power the software environments behind reinforcement learning and agentic AI, new Lambda Bare Metal Instances on NVIDIA Vera Rubin NVL72 Superclusters, a production-scale NVIDIA GB300 NVL72 Supercluster with NVIDIA Quantum-X Photonics, and its role as an early NVIDIA BlueField-4 STX adopter. Lambda is an early NVIDIA Vera CPU launch partner Models no longer just generate responses. They plan, call tools, run code, and interact with software environments in continuous feedback loops. Intelligence now extends beyond the model into surrounding systems, where millions of CPU-based sandbox environments execute actions and return results to GPUs. For modern agentic workloads, evaluation latency directly affects overall system performance. When sandbox environments fall behind, accelerators must wait for results. Higher per-core CPU performance increases reinforcement learning iterations per GPU hour and improves agent responsiveness, maximizing AI factory throughput across training and inference. NVIDIA Vera - high-density CPU capacity for AI factories: * 88-core CPU with high single-thread performance, tuned for latency-sensitive tasks * Spatial multi-threading increases agentic inference and RL sandbox density * Up to 1.5 TB of LPDDR5X memory capacity and 1.2 TB/s bandwidth configurations * Up to 1.8 TB/s CPU-to-GPU connectivity, reducing PCIe bottlenecks The development of modern models involves both long training runs and millions of short evaluations. NVIDIA Vera reduces evaluation time, increases the density of sandboxes per rack, and stabilizes per-core throughput. The result is repeatable behavior when you scale experiments into production. Lambda is an early NVIDIA BlueField-4 STX partner While the industry is shifting toward agentic AI, long-term memory and the processing of massive context windows are critical bottlenecks in inference. NVIDIA STX is a modular reference architecture for rack-scale AI storage, accelerating advanced inference through next-generation hardware integration and optimized KV-cache management. Lambda is an early NVIDIA STX adopter, so the storage layer never becomes the bottleneck for frontier-scale GPU clusters: * Context memory at scale: Up to 5x higher tokens per second and 5x greater power efficiency than traditional storage. * Acceleration at every layer: Full cluster integration of NVIDIA Vera CPUs, Rubin GPUs, BlueField-4 DPUs, and Spectrum-X Ethernet networking for data center-scale workloads. * Foundation for AI-native data platforms: High-speed data access for context memory, enterprise data, and high-performance storage use cases. NVIDIA STX-based platforms will be available in the second half of 2026, along with its NVIDIA Vera Rubin NVL72 Superclusters. Lambda Bare Metal Instances on NVIDIA Vera Rubin NVL72 Superclusters Today, Lambda is announcing Bare Metal Instances on Superclusters with NVIDIA Vera Rubin NVL72. For teams running large-scale foundation model training and complex distributed workloads, such as disaggregated inference, direct hardware access matters. Virtualization overhead is not theoretical at this scale. It compounds. Bare metal removes that layer entirely, while Lambda's Bare Metal Instances provide cloud usability. What Lambda Bare Metal Instances give you * Higher performance with no hypervisor overhead * Faster access to the newest compute as it becomes available * Full control over the hardware stack with no shared neighbors * Complete security oversight from the firmware layer up What Lambda Inc. built differently * One-to-one mapping between instances and physical hosts, with API parity for lifecycle operations. You get direct access to CPU, GPU, memory, and local storage while managing instances the same way you manage cloud VMs. * With no hypervisor mediating device access, workloads run directly on the underlying hardware, and your processes communicate directly over sixth-generation NVIDIA NVLink for scale-up and NVIDIA Quantum-X800 InfiniBand for scale-out. All-reduce, tensor parallel, and disaggregated prefill-decode traffic run over the raw fabric. * When a host degrades or fails, instance mobility moves your workload to healthy hardware without manual intervention, enabling faster recovery. You get the performance of raw bare-metal servers, with programmatic provisioning, predictable maintenance, and observability built for production ML. Learn more at Maxx Garrison's session about what deployment will look like for Lambda's Bare Metal Instances with NVIDIA Vera Rubin NVL72 and NVIDIA GB300 NVL72, covering what rack-scale readiness actually requires. Lambda's NVIDIA GB300 NVL72 Supercluster with NVIDIA Quantum-X Photonics At scale, the fabric is the system. You can't bolt a high-performance network onto a rack that wasn't engineered for it from the start. Lambda is leading one of the largest deployments of NVIDIA Quantum-X InfiniBand Photonics co-packaged optics switches to date, in an AI factory with 10,000+ NVIDIA GB300 GPUs. CPO switches eliminate the bandwidth bottleneck between racks and change the performance-per-watt calculus at cluster scale. Lambda Inc. announced its work on NVIDIA CPO and next-generation networking fabrics in 2025. Now it's running in production. What Lambda Inc. engineered for * Rack-first design: Power and liquid cooling are planned so racks run at sustained utilization without thermal or electrical surprises when jobs push the system hard. * Photonics fabric: NVIDIA Quantum-X InfiniBand Photonics CPO switches lower power, increase bandwidth, and improve resilience. That raises cluster-level bisection bandwidth and reduces energy per unit of useful work. * Validated NVIDIA GB300 NVL72 scale: Lambda hosts NVIDIA GB300 NVL72 clusters at the scale required for frontier training while preserving deterministic fabric behavior across the full job. This Supercluster is built to run large NVIDIA GB300 NVL72 jobs repeatedly and reliably. That reduces surprises and lowers the cost per useful result. Bringing it together: Lambda's full-stack validation Lambda Inc. validate the full stack before Lambda Inc. hand over clusters: production firmware, drivers, and orchestration are tested as a single unit. Then Lambda Inc. follow a pilot-to-production rollout so capacity, software, and operations arrive together: * Small-scale NVIDIA PODs to validate sandbox density and CPU-to-GPU connectivity before full-scale deployment * Phased rollouts so software and tooling scale alongside capacity * Out-of-band telemetry and DPU-based controls to monitor and manage fabric without adding noise or removing resources from AI workloads The system runs the same jobs repeatedly with minimal manual intervention. That's how infrastructure moves from working in a lab to running in production. NVIDIA Vera CPU delivers predictable CPU throughput. Rack-first engineering and photonics networking make the fabric scalable. Bare Metal Instances give teams control without operational overhead. Together, they make the Superintelligence Cloud a platform teams can trust in production. Lambda Inc. is also participating in NVIDIA's Fleet Intelligence Early Access Program to help develop telemetry, alerting, and integrity checks that will give you earlier visibility into GPU fleet issues before they escalate into workload-affecting outages. To see how these capabilities fit into a broader AI operations strategy and how NVIDIA and Lambda are collaborating to close the gap between standing up AI infrastructure and running it confidently at scale, catch "A Playbook - Operating Cloud AI Factories at Scale" [S81847] on Monday, March 16 at 3:00 p.m. PDT. Engage with Lambda at NVIDIA GTC * Meet with its team at booth #1507 or book an in-person session at lambda.ai/nvidia-gtc * Join the session on deploying Lambda's Bare Metal Instances with NVIDIA Vera Rubin NVL72 and NVIDIA GB300 NVL72.

Lambda
Feb 28th, 2026
Mila World Modeling Workshop: wrap-up

Mila World Modeling Workshop: wrap-up. What if large language models had eyes, ears, and other sensors that let them perceive the real world? What kind of internal representation would they build? Could such systems become fully autonomous, and what scientific and technical breakthroughs would it take to get there? These questions shaped the Workshop on World Modeling, co-hosted by Mila and Lambda, a gathering dedicated to advancing the next generation of intelligent systems. Explore the full workshop: https://world-model-mila.github.io/ Mila is a premier AI research institute founded in 1993 by Turing Award winner Yoshua Bengio. Its mission is to push the boundaries of machine learning through rigorous, foundational research and interdisciplinary collaboration. Lambda supported the workshop through both funding and scientific contributions. The event brought together researchers from across AI and machine learning to tackle one of the field's most ambitious challenges: building systems that can model and understand the world. Keynote speakers included Yoshua Bengio, Yann LeCun, Juergen Schmidhuber, Shirley Ho, Sherry Yang, and Lambda's own Amir Zadeh. Lambda's Amir Zadeh presenting at Mila World Modeling Workshop, 2026 Yoshua emphasized the importance of safety and autonomy, raising critical questions about alignment and control as AI systems become more capable and integrated into real-world settings. Yann and Sherry focused on the technical challenges ahead - scalable architectures, representation learning, multimodal integration, and the computational foundations required to build robust world models. Juergen and Amir addressed infrastructure and long-term strategy. Juergen reflected on lessons learned from decades of machine learning research and how those insights can guide the path toward general world models. Amir emphasized the importance of high-quality data, simulation environments, and the ability to solve real-world multimodal problems as essential stepping stones. Shirley advocated for a polymathic world model, grounded in scientific reasoning. In this view, world models should not merely capture correlations, but integrate structured knowledge from disciplines such as physics and other sciences to build deeper, causal representations of reality. The workshop accepted 50 papers, including 7 oral presentations. Contributions spanned video modeling, AI for science, multimodal learning, large language models, and JEPA-based approaches. Among the highlights: Tal Daniel, a collaborator from Carnegie Mellon University, presented joint work with Lambda titled "World Modeling using Latent Particle Models." The paper was accepted by ICLR as an oral presentation, placing it in the top 1.18% of submissions. Read the paper: https://taldatech.github.io/lpwm-web/ The Workshop on World Modeling underscored both the promise and complexity of building AI systems that can perceive, reason about, and act in the real world, and Lambda was glad to partner with Mila to advance this work and contribute to the foundations of what comes next.

Business Wire
Feb 19th, 2026
Lambda appoints Charles Fisher as CFO to lead capital strategy and AI infrastructure expansion

Lambda, an AI cloud infrastructure company, has appointed Charles Fisher as chief financial officer. Fisher will lead capital strategy and financial operations as the company scales its AI infrastructure business. Fisher joins from Turo, where he served as CFO, and previously held the role of EVP of corporate finance and development at Charter Communications. He replaces Heather Planishek, who stepped into the CFO role from Lambda's board of directors during a critical growth phase. Founded in 2012, Lambda builds supercomputers for AI training and inference, serving tens of thousands of customers ranging from AI researchers to enterprises and hyperscalers. Fisher said building the financial infrastructure to support sustained growth will be a key priority as Lambda meets growing demand for hyperscale AI infrastructure.

Business Wire
Feb 12th, 2026
Lambda appoints ex-AWS and Snap exec Jerry Hunter as vice chairman of compute delivery

Lambda, an AI cloud infrastructure provider, has appointed Jerry Hunter as Vice Chairman, Compute Delivery and Special Advisor to the Board. Hunter will guide Lambda's long-term infrastructure strategy and support deployment of large-scale AI facilities. Hunter brings extensive experience building hyperscale infrastructure. He spent over a decade at Sun Microsystems and helped establish AWS's global data centre organisation. Most recently, he served as Chief Operating Officer at Snap, scaling the platform to billions in revenue. The appointment comes amid rapid growth at Lambda, driven by increasing demand for AI compute. Founded in 2012, Lambda serves tens of thousands of customers ranging from AI researchers to enterprises and hyperscalers, providing supercomputers for AI training and inference.

INACTIVE