Full-Time

Senior Software Engineer

Profiling Services

Posted on 9/22/2025

NVIDIA

NVIDIA

10,001+ employees

Designs GPUs and AI HPC platforms

Compensation Overview

$184k - $356.5k/yr

+ Equity

Company Historically Provides H1B Sponsorship

Austin, TX, USA + 1 more

More locations: Santa Clara, CA, USA

In Person

Category
Software Engineering (2)
,
Required Skills
Python
CUDA
Pytorch
Operating Systems
C/C++
Requirements
  • BS or MS degree or equivalent experience in Computer Engineering, Computer Science, or related degree.
  • 8+ years of meaningful software development experience in C, C++, and Python
  • 10+ years in system software design, operating systems fundamentals, computer architectures, performance analysis, and delivering production-quality software.
  • Strong interpersonal, verbal, and written communication, demonstrating the ability to build cross-organizational partnerships and lead technical teams through complex challenges.
  • Profiling & Performance Tools Expert: Extensive knowledge of profiling technologies (sampling, tracing), overhead analysis, and diverse profiling data (CPU/GPU events, performance counters, API traces, event correlation). Familiarity with existing profiling ecosystems and their limitations is a plus.
  • GPU & CUDA Proficiency: In-depth knowledge of CUDA APIs, runtime, streams, kernels, and GPU architecture.
  • ML Ecosystem & Performance Analysis: Familiarity with ML frameworks such as PyTorch and JAX, and knowledge of performance analysis for AI training/inference applications.
  • Large-Scale System Development & Debugging: Experience developing and debugging across complex multi-layered software systems, including user mode and kernel drivers, with a proven ability to contribute to and extend substantial codebases (100s of millions of lines).
  • Proficiency in Designing APIs and Interfaces for Profiling Tools: Designs robust, flexible APIs and interfaces enabling seamless integration of profiling tools with various frameworks and custom code.
  • Mastery of Problem Simplification: A history of breaking down ill-defined problems in complex technical domains, designing effective solutions, and leading teams to implement them.
Responsibilities
  • Architect and Build Scalable Systems: Drive the design and implementation of the AON profiling service's core systems. You'll master inter-process communication (IPC), memory management, and building low-overhead architectures to handle profiling data from complex multi-node, multi-process, multi-GPU, and cluster environments.
  • Elevate Software Engineering Excellence: Promote high standards in software development, including design patterns, concurrency, parallelism, and advanced debugging for asynchronous systems. Our commitment to code quality and robust testing ensures a reliable profiling service.
  • Lead, Mentor, and Innovate: Guide and mentor engineers, provides impactful code reviews, and shape technical roadmaps. Proactively identify complex technical issues within the AON project, break them down, and craft innovative solutions. Your problem-solving prowess will be crucial for AON's success with ML workloads.
  • Architect and Build High-Performance Platforms: Transform user needs into clear requirements and design documents. Explore diverse approaches to problems, making well-reasoned recommendations. Lead end-to-end feature development—from planning and prototyping to implementation, testing, and customer evaluation. This involves hands-on development across user applications, drivers, performance counter libraries, and lower-level platform/hardware abstraction layers.
  • Collaborate Across Boundaries: Partner effectively with diverse internal and external teams. Exceptional communication and collaboration skills are key to integrating AON seamlessly into the broader profiling and ML ecosystem.
Desired Qualifications
  • Pioneering Low-Overhead Profiling Systems: A track record of designing and implementing profiling systems with minimal performance impact on target workloads, especially in complex multi-process and distributed environments.
  • Deep Understanding of PyTorch Internals & CUDA Usage: A comprehensive grasp of how PyTorch uses CUDA, including tensor memory, operations, and distributed training functionalities.
  • GPU Performance Analysis & Optimization Acuity: The ability to analyze profiling data and translate it into concrete, actionable insights, particularly within CUDA and ML Frameworks like PyTorch.
  • Translating Customer Needs: Skilled at redefining customer requests into actionable use cases and requirements.
  • Strong understanding of system security principles.

NVIDIA designs and manufactures graphics processing units (GPUs) and computing platforms used for gaming, data centers, and artificial intelligence. These products work by using parallel processing to handle complex mathematical calculations much faster than standard computer processors, supported by a software ecosystem that allows developers to build and run AI models. Unlike competitors that may focus solely on hardware, NVIDIA integrates its chips with specialized software and cloud services to create a complete environment for high-performance tasks. The company’s goal is to provide the underlying technology necessary to power advanced computing, from realistic video game graphics to autonomous vehicles and large-scale data analysis.

Company Size

10,001+

Company Stage

IPO

Headquarters

Santa Clara, California

Founded

1993

Simplify Jobs

Simplify's Take

What believers are saying

  • Agentic AI adoption at scale drives major inflection in inference demand globally.
  • Jensen Huang projects $3T-$4T global AI factory buildout through 2030.
  • Data centre networking revenue surged 263% YoY to $10.98B in Q4 FY2026.

What critics are saying

  • Nemotron 3 open weights enable AMD and Intel to replicate NVIDIA's software moat.
  • Insider selling over three months signals executive doubt about sustaining 73% growth.
  • $30B OpenAI investment exposes NVIDIA to catastrophic losses from governance collapse.

What makes NVIDIA unique

  • Vera Rubin launching July 2026 reduces inference token costs tenfold versus Blackwell.
  • Nemotron 3 Nano Omni achieves 9x higher throughput on consumer hardware like RTX 4090.
  • Clear datacenter product roadmap extends through 2028 with Feynman arriving in 2028.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Company Equity

401(k) Company Match

Growth & Insights and Company News

Headcount

6 month growth

-1%

1 year growth

-3%

2 year growth

-2%
The Associated Press
Apr 15th, 2026
Matlantis integrates NVIDIA ALCHEMI Toolkit for 10x faster materials simulation

Matlantis has integrated NVIDIA's ALCHEMI Toolkit into its materials simulation platform to accelerate industrial materials discovery. The company previously incorporated NVIDIA Warp-optimised kernels, achieving up to 10x speed improvements in atomistic calculations. The integration includes LightPFP, Matlantis' lightweight potential for large-scale simulations, which uses a server-based architecture with NVIDIA ALCHEMI Toolkit-Ops to reduce communication bottlenecks. Matlantis plans to integrate its flagship Universal Machine-Learning Interatomic Potential with the toolkit to further enhance GPU efficiency. Launched in 2021, Matlantis is a cloud-based atomistic simulator jointly developed by PFN and ENEOS. The platform uses deep learning to increase simulation speeds by tens of thousands of times and serves over 150 companies discovering materials including catalysts, batteries and semiconductors.

CNBC
Apr 14th, 2026
Nvidia stock surges 18% on 10-day winning streak fuelled by $1T GPU orders through 2027

Nvidia shares have climbed 18% over a ten-day winning streak, the longest since 2023. The stock is trading about 8% below its October all-time high of $212.19. CEO Jensen Huang revealed at last month's GTC conference that Nvidia has over $1 trillion in GPU orders through 2027, including Blackwell and next-generation Vera Rubin chips. Data centre revenue surged 75% year-over-year and now comprises 88% of the business, a dramatic shift from five years ago when gaming dominated. The rally follows major deals including Meta's February commitment to deploy millions of Nvidia chips across its global data centres. On Monday, Nvidia denied rumours it was pursuing acquisitions of PC makers Dell or HP. The company also unveiled Ising, a new family of open-source models for quantum computing.

Yahoo Finance
Apr 14th, 2026
D-Wave CEO claims quantum computers could challenge Nvidia's AI dominance with superior power efficiency

D-Wave Quantum CEO Alan Baratz claims quantum computing poses a threat to Nvidia, citing superior energy efficiency. Speaking at the Semafor World Economy Summit, Baratz said D-Wave's quantum computer uses just 10 kilowatts of power—equivalent to five or 10 GPUs—whilst solving problems that would take GPU systems nearly a million years. D-Wave shares rose nearly 16% on Tuesday, part of a 140% gain over the past year. The company reported $2.75 million in Q4 revenue, missing analyst estimates, but bookings surged 471% to $13.4 million. The $5.3 billion company recently secured a $20 million agreement with Florida Atlantic University and acquired Quantum Circuits for $550 million. However, quantum machines remain specialised tools, unable to run large language models that drive Nvidia's dominance.

Yahoo Finance
Apr 14th, 2026
Vertiv partners with Nvidia on AI data centre infrastructure as analysts raise price target to $300

Vertiv Holdings has been reaffirmed with a Buy rating by Evercore ISI, setting a price target of $280, whilst Barclays raised its target from $281 to $300 with an Overweight rating. The electrical equipment company is partnering with Nvidia on AI infrastructure development. On 16th March, Nvidia introduced its Vera Rubin DSX AI Factory reference design, with Vertiv providing critical power and cooling solutions for AI data centres. The partnership integrates Vertiv's infrastructure expertise with Nvidia's AI systems to enhance energy efficiency and performance. Vertiv is developing Vertiv OneCore Rubin DSX, a prefabricated system designed to accelerate AI factory deployment. The Brussels-headquartered company specialises in critical digital infrastructure technologies for data centres and communication networks.

Yahoo Finance
Apr 14th, 2026
Nvidia and Dell: AI infrastructure stocks to buy ahead of May earnings

Nvidia and Dell Technologies are positioned as attractive AI infrastructure investments ahead of their May earnings reports, according to recent analysis. Both companies supply critical hardware for AI computing, with demand for AI capacity continuing to outpace available resources across major cloud services. Nvidia shares have remained flat for six months despite strong fundamentals. Last quarter, its data centre business generated $62 billion in revenue, up 75% year over year, with a 75% gross margin. The company expects over $1 trillion in cumulative orders for its Blackwell and upcoming Rubin chips through 2027. Trading at 17 times next year's expected earnings, Nvidia's valuation appears discounted relative to its 66% revenue growth in fiscal year 2026. Dell Technologies similarly stands to benefit from the AI infrastructure build-out. Both companies report earnings in May.

INACTIVE