Full-Time

GPGPU Software Architect/ Principal Engineer

Posted on 8/5/2025

XPENG Motors

XPENG Motors

1,001-5,000 employees

Designs and manufactures intelligent electric vehicles and aircrafts

Compensation Overview

$241.8k - $409.2k/yr

+ Bonus + Equity

Santa Clara, CA, USA + 1 more

More locations: San Diego, CA, USA

In Person

Category
Software Engineering (1)
Required Skills
Tensorflow
CUDA
Pytorch
Requirements
  • 10 years + in systems software, with at least 5 years in designing CUDA Compute stacks
  • Led end-to-end development of a GPU Runtime or AI acceleration library generation
  • Comprehensive mastery of PTX/SASS, CUDA Driver API, and cuBLAS/cuDNN internals; experience with LLVM NVPTX backend
  • Profound understanding of GPU micro-architecture, including SM architecture, Warp Scheduler, Shared-Memory conflicts, and Tensor Core pipelines
  • Proficiency with PCIe/CXL/RDMA topologies, NUMA settings, and GPU Direct RDMA/Storage
Responsibilities
  • Develop and refine a comprehensive 3-year roadmap for a software stack compatible with CUDA, encompassing Runtime, Driver, Compiler, Profiler, Debugger, and AI acceleration libraries
  • Define binding specifications that link our upcoming GPU ISA to CUDA APIs, ensuring forward compatibility with CUDA 12.x features
  • Evaluate and integrate the latest technological advancements: CUDA Graph, Transformer Engine, virtual memory management, CUDA dynamic CUTLASS 3.x, TMA, Blackwell FP4, among others
  • Create a modular, layered Runtime architecture: CUDA → HAL → Kernel → Hardware, applicable across emulators, and actual silicon
  • Define the task launch protocol, including Queue, Stream, Event, and Graph, as well as the memory model
  • Design a dual-mode (JIT & offline) compiler supporting LTO, PGO, Auto-Tuning, and efficient PTX→ISA microcode caching
  • Develop GPU virtualization schemes(MIG) that work across processes and containers
  • Implement an end-to-end performance model: Python API → CUDA Runtime → Driver → ISA → Micro-architecture → Board-level interconnect
  • Build an observability platform: Nsys-compatible traces, real-time Metric-QPS dashboards, and an AI Advisor for identifying bottlenecks automatically
  • Manage internal AI benchmarks as the single source of truth. Benchmark includes MLPerf Inference, Stable Diffusion XL, and 70B LLM
  • Co-design ISA which compatible with CUDA Compute Capability 12.x with our hardware architecture team
  • Collaborate with AI framework teams (PyTorch, TensorFlow, JAX, ONNX Runtime) to build fully reusable kernel libraries
  • Partner with Cloud and K8s teams to co-develop Device Plugins, GPU Operators, and RDMA Network Policies
Desired Qualifications
  • None

XPENG stands out as a leader in the tech industry, with its focus on intelligent mobility solutions such as electric vehicles and eVTOL aircraft, demonstrating a competitive edge in the rapidly evolving transportation sector. The company's proprietary Advanced Driver Assistance System (XPILOT) and intelligent operating system (Xmart OS) enhance the user experience by integrating technology and mobility, positioning XPENG as a pioneer in smart, people-first mobility. The company's culture fosters technological advancement, making it an exciting workplace for those passionate about shaping the future of transportation.

Company Size

1,001-5,000

Company Stage

N/A

Total Funding

$8.2B

Headquarters

Guang Zhou Shi, China

Founded

2014

Your Connections

People at XPENG Motors who can refer or advise you

INACTIVE