Simplify Logo

Full-Time

Senior HPC Systems Engineer

Confirmed live in the last 24 hours

Lambda

Lambda

51-200 employees

On-demand public cloud with NVIDIA GPUs for deep learning

Data & Analytics
Hardware
Enterprise Software
AI & Machine Learning

Compensation Overview

$180k - $250kAnnually

+ Cash & Equity Compensation

Expert

Remote in USA + 1 more

Category
Applied Machine Learning
AI & Machine Learning
Required Skills
Kubernetes
Python
Linux/Unix
Requirements
  • Expertise in architecting, operating, and debugging large-scale HPC network and storage infrastructure
  • Experience with building complex software using Python
  • Deep understanding of Linux fundamentals, especially its networking stack
  • Experience with large GPU clusters
  • Experience with virtualization and Kubernetes
  • Background in Computer Science, Electrical Engineering, Mathematics, or Physics
Responsibilities
  • Design and architect AI supercomputers for the cloud
  • Introduce technology to improve performance of HPC storage and networking infrastructure
  • Benchmark, tune, and optimize hypervisors, network, and storage
  • Set up monitoring and alerting for high availability
  • Provide guidance to HPC customers

With a focus on deep learning and generative AI, this company offers on-demand access to advanced NVIDIA H100 Tensor Core GPUs in a public cloud, catering specifically to massive-scale AI projects. It facilitates robust cloud clusters enhanced by 3200 Gbps Infiniband, ensuring exceptional processing speeds and efficiency. Moreover, its adoption of an open source AI software stack, used by over 50,000 machine learning teams, underscores its commitment to community-driven innovation and support for industry-standard tools like PyTorch® and TensorFlow.

Company Stage

Series C

Total Funding

$932.2M

Headquarters

San Jose, California

Founded

2012

Growth & Insights
Headcount

6 month growth

36%

1 year growth

81%

2 year growth

265%