Full-Time

HPC Operations Manager

Hardware Engineering

Confirmed live in the last 24 hours

NVIDIA

NVIDIA

10,001+ employees

Designs GPUs and AI computing solutions

Automotive & Transportation
Enterprise Software
AI & Machine Learning
Gaming

Compensation Overview

$272k - $425.5kAnnually

+ Equity

Senior, Expert

Austin, TX, USA + 3 more

More locations: Santa Clara, CA, USA | Durham, NC, USA | Westford, MA, USA

Category
Hardware Engineering
Hardware Validation & Testing
Required Skills
Development Operations (DevOps)
Linux/Unix
Requirements
  • B.S. or M.S. in Computer Science, Computer Engineering, Information Science (or equivalent experience)
  • 15+ years overall
  • 5+ years managing IT infrastructure teams of 10+ people
  • 10+ years experience running Linux servers, NFS storage, and Ethernet networks
  • Knowledge of HPC schedulers (IBM LSF preferred)
  • Knowledge of hardware design workflows (EDA tools and methodology)
  • Experience using project management and capacity planning software
  • Datacenter operations (rack and stack, maintenance)
Responsibilities
  • Collaborating with partners to develop programs driving around storage, networking, and compute in data centers
  • Lead, cultivate, and mentor a multi-national team of sysadmins and devops engineers
  • Ensure the highest reliability of HPC clusters and develop critical metrics
  • Identify failures, lead retrospective analysis, and develop improvement action plans
  • Evaluate the latest technologies and recommend future evolution of the infrastructure
  • Work multi-functionally with hardware engineering leaders to support chip design needs
  • Lead all aspects of the HPC scheduler (LSF) and ensure delivery of forecasted compute demand
  • Track software licensing servers and drive efficient license utilization
  • Develop and manage program schedules, milestones and deliverables
  • Regularly communicate program status and key issues to senior management
Desired Qualifications
  • HPC storage (e.g. Netapp, Pure Storage, Lustre, ZFS, Isilon)
  • Infiniband (operations, debugging, performance tuning)
  • Software development, especially in a devops context
  • Knowledge of relational databases, data lakes, metrics/visualization/analytics platforms
  • Deploying and maintaining FlexLM-based software license servers
  • Established relationships with enterprise-level equipment suppliers

NVIDIA designs and manufactures graphics processing units (GPUs) and system on a chip units (SoCs) for various markets, including gaming, professional visualization, data centers, and automotive. Their main products are GPUs that enhance gaming experiences and support professional applications, along with AI and high-performance computing platforms tailored for developers and data scientists. NVIDIA differentiates itself from competitors by focusing on advanced technology and continuous innovation, ensuring their products meet the evolving needs of users. The company's goal is to lead in AI and HPC solutions, providing powerful tools and services that enable clients to achieve immersive experiences and drive advancements in their respective fields.

Company Stage

IPO

Total Funding

$19.5M

Headquarters

Santa Clara, California

Founded

1993

Growth & Insights
Headcount

6 month growth

0%

1 year growth

0%

2 year growth

0%
Simplify Jobs

Simplify's Take

What believers are saying

  • Acquisition of VinBrain enhances NVIDIA's AI-driven healthcare solutions.
  • Investment in Nebius Group boosts NVIDIA's AI infrastructure capabilities.
  • Partnership with Serve Robotics aligns with NVIDIA's focus on robotics and AI applications.

What critics are saying

  • Increased competition from AI startups like xAI challenges NVIDIA's market position.
  • Serve Robotics' rapid expansion may lead to financial strain if market growth lags.
  • Integration challenges from VinBrain acquisition may affect NVIDIA's operational efficiency.

What makes NVIDIA unique

  • NVIDIA leads in AI and HPC solutions with cutting-edge GPU technology.
  • The Omniverse platform enhances NVIDIA's capabilities in industrial AI and digital twins.
  • NVIDIA's cloud services, like CloudXR, offer scalable solutions for AI and machine learning.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Company Equity

401(k) Company Match