Full-Time

Head of Inference Kernels

Posted on 7/25/2025

Etched

Etched

201-500 employees

Generates open-world, interactive game environments

No salary listed

San Jose, CA, USA

In Person

Relocation support for those moving to San Jose (Santana Row)

Category
AI & Machine Learning (2)
,
Required Skills
CUDA
Requirements
  • Experience in designing and optimizing GPU kernels for deep learning on GPUs using CUDA, and assembly (ASM). You should have experience with low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance.
  • Deep fluency with transformer inference architecture, optimization levers, and full-stack systems (e.g., vLLM, custom runtimes). History of delivering tangible perf wins on GPU hardware or custom AI accelerators.
  • Have solid understanding of roofline models of compute throughput, memory bandwidth and interconnect performance.
  • Experienced in running large-scale workloads on heterogeneous compute clusters, optimizing for efficiency and scalability of AI workloads.
  • Scopes projects crisply, sets aggressive but realistic milestones, and drives technical decision-making across the team. Anticipates blockers and shifts resources proactively.
Responsibilities
  • Architect Best-in-Class Inference Performance on Sohu: Deliver continuous batching throughput exceeding B200 by ≥10x on priority workloads
  • Develop Best-in-Performance Inference Mega Kernels: Develop complex, fused kernels (including basics like reordering and fusing, but also more complex work involving simultaneous computation and transmission of intermediate values for sequential matmuls) that increase chip utilization and reduce inference latency, and validate these optimizations through benchmarking and regression-tested in production pipelines.
  • Architect Model Mapping Strategies: Develop system level optimizations using a mix of techniques such tensor parallelism and expert parallelism for optimal performance.
  • Hardware-Software Co-design of Inference-time Algorithmic Innovation: Develop and deploy production-ready inference-time algorithmic improvements (e.g., speculative decoding, prefill-decode disaggregation, KV cache offloading)
  • Build Scalable Team and Roadmap: Grow and retain a team of high-performing inference optimization engineers.
  • Cross-Functional Performance Alignment: Ensure inference stack and performance goals are aligned with the software infrastructure teams (e.g., runtime, and scheduling support), GTM (e.g., latency SLAs, workload targets) and hardware teams (e.g., instruction design, memory bandwidth) for future generations of our hardware.
Desired Qualifications
  • Experience with implementation of state-of-the-art reasoning and chain-of-thought models at production scale
  • Experience with implementation of newer AI compute operations on hardware (e.g., flash attention, long-context attention variants and alternatives)
  • Analyzed and implemented strategies such as KV-cache offloading for efficient compute resource management
  • Familiarity with linear algebra (e.g. matrix decomposition, alternatives bases for vector spaces, matrix rank and its implications)
  • Managed lean, high-performing engineering teams and drove execution on timelines with high quality outcomes

Etched builds Oasis, an AI model that creates playable open-world video games. Oasis generates entire game worlds with interactive, explorable environments, enabling developers to design expansive experiences rather than just producing non-interactive media. Unlike other AI video models that yield only prompts-to-video outputs, Oasis focuses on dynamic game-generation capabilities that deliver a playable world. The company differentiates itself by offering a toolset specifically for game development, licensing technology to studios or providing a platform to create new games, rather than generic AI outputs. The goal is to empower game developers to quickly craft large, living game worlds by using AI-driven world-generation technology.

Company Size

201-500

Company Stage

Late Stage VC

Total Funding

$625.4M

Headquarters

San Jose, California

Founded

2022

Simplify Jobs

Simplify's Take

What believers are saying

  • Hyperscalers (AWS, Meta, Microsoft) acquire Etched to vertically integrate AI chips.
  • Oasis licensing generates recurring revenue from game studios and indie developers.
  • Transformer dominance in AI workloads sustains Sohu demand through 2027–2028.

What critics are saying

  • TSMC capacity constraints delay Sohu production ramp by multiple quarters.
  • Nvidia releases transformer-optimized CUDA software, neutralizing Etched's performance advantage.
  • Oasis faces IP infringement claims from game publishers over Minecraft-like generation.

What makes Etched unique

  • Sohu ASIC claims 20x faster transformer inference than Nvidia H100 GPUs.
  • Oasis generates real-time, interactive open-world games from user keyboard input.
  • Transformer-only architecture eliminates software complexity of general-purpose GPU compilers.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Unlimited Paid Time Off

Flexible Work Hours

Remote Work Options

Paid Vacation

Paid Sick Leave

Paid Holidays

Hybrid Work Options

Stock Options

Company Equity

401(k) Retirement Plan

401(k) Company Match

Performance Bonus

Profit Sharing

Employee Stock Purchase Plan

Relocation Assistance

Employee Referral Bonus

Parental Leave

Family Planning Benefits

Fertility Treatment Support

Adoption Assistance

Childcare Support

Elder Care Support

Wellness Program

Mental Health Support

Gym Membership

Phone/Internet Stipend

Home Office Stipend

Conference Attendance Budget

Professional Development Budget

Growth & Insights and Company News

Headcount

6 month growth

-2%

1 year growth

3%

2 year growth

9%
SiliconANGLE Media
Jan 14th, 2026
AI chip unicorns Etched.ai and Cerebras Systems get big funding boost to target Nvidia

AI chip unicorns Etched.ai and Cerebras Systems get big funding boost to target Nvidia - SiliconANGLE

Bloomberg Law
Jan 13th, 2026
Etched raises $500M at $5B valuation to challenge Nvidia in AI chip market

AI chip startup Etched has raised approximately $500 million in a funding round led by Stripes, valuing the company at $5 billion. Billionaire Peter Thiel participated in the round, alongside Positive Sum and Ribbit Capital. The investment brings Etched's total capital raised to nearly $1 billion as it seeks to compete with Nvidia in the rapidly expanding artificial intelligence processor market. The California-based company is developing specialised chips designed to challenge Nvidia's dominance in AI hardware, though specific details about its technology and commercial strategy were not disclosed.

Securities.io
Mar 13th, 2025
Emerging Technologies Shaping The Future Of Ai Hardware

A New Type Of ComputingWith the last few years of exponential progress in AI technology, major winners have been the builders of AI hardware. This is because modern AI, mostly using neural network technology, uses computing power in a very different way from classical computers.Instead of performing complex calculations with a powerful CPU, they instead perform thousands or millions of simpler calculations in parallel.(You can learn more about how neural networks were invented and work in “Investing in Nobel Prize Achievements – Artificial Neural Networks, The Basis Of AI”)So far, graphic cards, or GPUs (Graphics processing units), have been the favored tool to develop AI, dramatically boosting the revenues and profits of leaders in the sector like Nvidia (NVDA +0.39%).The market for AI hardware is expected to keep growing exceptionally quickly, at 31.2% CAGR from 2025 to 2035This period should also see the emergence of many new types of AI hardware, as the GPUs repurposed for AI calculation are progressively replaced by chips designed specifically for this application.In the long run, more exotic forms of computing are likely to make their way into the AI hardware market, from application-specific designs to non-silicon chips or even using actual biological neurons.How AI Thinking WorksThe fundamental difference between classical supercomputers and AIs is how data is processed. Instead of solving complex calculations, neural networks create virtual nodes connected into a network. While the initial neural network contained barely a few dozen nodes, making a few hundred connections, modern neural networks like the ones used by ChatGPT use trillions of possible connections, reaching levels of complexity not dissimilar to the human brain. This different method of calculus requires hardware able to perform millions of operations in parallel, even if the computing power dedicated to each is relatively small.Luckily, this is a type of hardware that has already been in operation for many years, such as graphic rendering using GPUs, mostly for 3D simulation and videogames, and also uses this type of many small calculations in parallel.This is why the initial (and current) winner of the race to secure enough AI chips has been Nvidia, the leader in the GPU market.Ever Quicker GPUsWith the invention of more efficient algorithms and the quick progress in artificial intelligence they created, the potential applications of AI exploded in the 2020s.This led to an ever-increasing race to secure enough hardware, especially Nvidia GPU in 2023.In parallel, increasing expectations from AI potential applications require ever smarter AI, which itself requires more computing power. And while securing more GPUs was a solution, better GPUs were needed as well.The industry delivered, with a 1,000x growth in performance in less than 8 years.Can It Last?There are signs that progress in GPU performances might soon slow down. First, all the “easy” improvements, like making GPUs bigger and with smaller and denser transistors, are getting maxed out

InfoQ
Nov 10th, 2024
Decart and Etched Release Oasis, a New AI Model Transforming Gaming Worlds

Decart.ai and Etched.ai recently introduced Oasis, an AI-driven model that generates a fully interactive, real-time open-world experience inspired by Minecraft.

CryptoSlate
Jun 26th, 2024
Is The Nvidia Top In As Etched Launches Asic For Llms 20X Faster Than H100 Gpus?

Etched is making waves in the artificial intelligence hardware space with its revolutionary new AI accelerator chip. The Silicon Valley startup, founded in 2022 by Harvard dropouts Gavin Uberti and Chris Zhu, has developed a custom application-specific integrated circuit (ASIC) called Sohu that is purpose-built to run transformer models – the architecture behind today’s most advanced AI systems.Etched transformer ASICS for LLMsEtched claims its Sohu chip can process AI workloads up to 20 times faster than Nvidia’s top-of-the-line GPUs while using significantly less power. With $120 million in fresh funding and partnerships with major cloud providers, Etched is positioning itself as a formidable challenger to Nvidia’s dominance in AI chips.Performance of Sohu vs top GPUs (Etched)Primary Venture Partners and Positive Sum Ventures led the funding round, which included participation from high-profile investors like Peter Thiel, Github CEO Thomas Dohmke, and former Coinbase CTO Balaji Srinivasan. As transformer models continue to drive breakthroughs in generative AI, Etched’s specialized hardware could reshape the landscape of AI computing.Etched’s approach targets the complexities of GPUs and TPUs, particularly the need to handle arbitrary CUDA and PyTorch code, which demands sophisticated compilers. While other AI chip developers like AMD, Intel, and AWS have invested billions into software development with limited success, Etched is narrowing its focus. By exclusively running transformers, Etched can streamline software development for these models.Most AI companies use transformer-specific inference libraries such as TensorRT-LLM, vLLM, or HuggingFace’s TGI

INACTIVE