Full-Time

Senior DGX Cloud Performance Engineer

Confirmed live in the last 24 hours

NVIDIA

NVIDIA

10,001+ employees

Designs GPUs and AI computing solutions

Compensation Overview

$224k - $425.5k/yr

+ Equity

Expert

Company Historically Provides H1B Sponsorship

Santa Clara, CA, USA

Category
Applied Machine Learning
Deep Learning
AI Research
AI & Machine Learning
Required Skills
LLM
Microsoft Azure
Python
Tensorflow
Pytorch
Machine Learning
AWS
C/C++
Google Cloud Platform
Requirements
  • 12+ years of proven experience
  • Ability to work with large scale parallel and distributed accelerator-based systems
  • Expertise optimizing performance and AI workloads on large scale systems
  • Experience with performance modeling and benchmarking at scale
  • Strong background in Computer Architecture, Networking, Storage systems, Accelerators
  • Familiarity with popular AI frameworks (PyTorch, TensorFlow, JAX, Megatron-LM, Tensort-LLM, VLLM) among others
  • Experience with AI/ML models and workloads, in particular LLMs
  • Understanding of DNNs and their use in emerging AI/ML applications and services
  • Bachelors or Masters in Engineering (preferably, Electrical Engineering, Computer Engineering, or Computer Science) or equivalent experience
  • Proficiency in Python, C/C++
  • Expertise with at least one of public CSP infrastructure (GCP, AWS, Azure, OCI, …)
Responsibilities
  • Develop benchmarks, end to end customer applications running at scale, instrumented for performance measurements, tracking, sampling, to measure and optimize performance of meaningful applications and services
  • Construct carefully designed experiments to analyze, study and develop critical insights into performance bottlenecks, dependencies, from an end to end perspective
  • Develop ideas on how to improve the end to end system performance and usability by leading changes in the HW or SW (or both)
  • Collaborate with external CSPs during the full life cycle of cluster deployment and workload optimization to understand and drive standard methodologies
  • Collaborate with AI researchers, developers, and application service providers to understand difficulties, requirements, project future needs and share best practices
  • Work with a diverse set of LLM workloads and their application areas such as health care, climate modeling, pharmaceuticals, financial futures, Genomics/Drug discovery, among others
  • Develop the vital modeling framework and the TCO analysis to enable efficient exploration and sweep of the architecture and design space
  • Develop the methodology needed to drive the engineering analysis to advise the architecture, design and roadmap of DGX Cloud
Desired Qualifications
  • Very high intellectual curiosity; Confidence to dig in as needed; Not afraid of confronting complexity; Able to pick up new areas quickly
  • Proficiency in CUDA, XLA
  • Excellent interpersonal skills
  • PhD nice to have

NVIDIA designs and manufactures graphics processing units (GPUs) and system on a chip units (SoCs) for various markets, including gaming, professional visualization, data centers, and automotive. Their products, particularly GPUs, are essential for high-performance computing and artificial intelligence applications. NVIDIA's GPUs work by processing large amounts of data simultaneously, making them ideal for tasks like gaming graphics and complex computations in AI and machine learning. Unlike many competitors, NVIDIA not only sells hardware but also offers software solutions and cloud-based services, enhancing the usability of their products. The company's goal is to lead in AI and HPC solutions by continuously investing in research and development to provide advanced technologies for a wide range of clients.

Company Size

10,001+

Company Stage

IPO

Headquarters

Santa Clara, California

Founded

1993

Simplify Jobs

Simplify's Take

What believers are saying

  • Acquisition of Lepton AI boosts NVIDIA's cloud and AI capabilities.
  • Collaboration with Utilidata expands NVIDIA's role in energy sector innovations.
  • Backing AI21 strengthens NVIDIA's position in AI research and development.

What critics are saying

  • Integration challenges may arise from Lepton AI acquisition.
  • Increased competition from startups like nEye Systems threatens NVIDIA's market position.
  • Resource allocation to multiple ventures may dilute focus on core business areas.

What makes NVIDIA unique

  • NVIDIA leads in AI and HPC with cutting-edge GPU technology.
  • The company excels in cloud services with acquisitions like Lepton AI.
  • NVIDIA's partnerships enhance its influence in diverse sectors, including energy and AI.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Company Equity

401(k) Company Match

Growth & Insights and Company News

Headcount

6 month growth

1%

1 year growth

0%

2 year growth

0%
Dealroom
Jun 2nd, 2025
Nvidia company information, funding & investors

Nvidia, designing and manufacturing high-performance gpus, ai platforms, and software solutions for gaming, professional visualization, data centers, and autonomous vehicles. Here you'll find information about their funding, investors and team.

Business Insider
May 9th, 2025
Nvidia-backed Israeli AI startup AI21 is raising a $300 million funding round

AI21, an Israeli startup building its own large language models (LLMs), is raising a $300 million funding round, a source said.

Canary Media
Apr 29th, 2025
Utilidata raises $60M for smarter grids

Utilidata has raised $60 million to explore the potential of AI chips in enhancing grid intelligence. Collaborating with Nvidia and utility partners like Portland General Electric and Duquesne Light, the projects aim to gather detailed grid data, particularly regarding distributed energy resources like rooftop solar and EV chargers. These efforts focus on optimizing grid operations through virtual power plants and distributed energy resource management systems, leveraging real-time data and communication.

SiliconANGLE
Apr 11th, 2025
nEye Systems raises $58M for AI chips

Silicon photonics startup nEye Systems raised $58M in funding led by CapitalG, with participation from Microsoft, Micron, Nvidia, and others. The Emeryville-based company is developing optical networking chips for AI data centers, promising faster, more efficient, and cost-effective data transfers. nEye's technology aims to overcome bandwidth and energy limitations of current electrical interconnects. Prototypes are ready, with production samples expected next year. Total funding exceeds $72M.

Aibase
Apr 8th, 2025
Nvidia Acquires Lepton AI for Millions

Nvidia has completed its acquisition of Lepton AI, a startup founded by former Alibaba VP Yangqing Jia, for reportedly hundreds of millions of dollars. Lepton AI, established in 2023, focuses on AI infrastructure and cloud solutions. Co-founders Yangqing Jia and Junjie Bai have joined Nvidia. Jia, a notable AI expert, previously contributed to TensorFlow at Google and led AI R&D at Alibaba.