Full-Time

HPC Operations Engineer

Updated on 5/2/2024

CoreWeave

CoreWeave

201-500 employees

Specialized GPU cloud provider for intensive computing

Data & Analytics
Hardware
AI & Machine Learning

Mid

Remote in USA

Required Skills
Linux/Unix
Requirements
  • 2 or more years of experience troubleshooting or administering data center or on-prem infrastructure
  • Strong understanding of Linux system administration and networking concepts
  • Ability to troubleshoot hardware and software issues and perform system maintenance tasks consistently and reliably
Responsibilities
  • Install, configure, and maintain large-scale high-performance supercomputing clusters running state-of-the-art GPUs
  • Troubleshoot hardware and software issues; escalate and coordinate as needed with data center, network and platform teams to drive resolution
  • Monitor and analyze system performance and take appropriate remediation actions for cloud health
  • Approach work with flexibility and optimism anticipating shifting business and technical priorities
  • Create and maintain documentation of team processes, knowledge and best practices for system management
  • Think critically about day-to-day work and work collaboratively to improve team processes and efficiency

CoreWeave is a specialized cloud provider that excels in delivering high-performance GPU compute resources for compute-intensive applications. Offering managed Kubernetes, virtual servers, and advanced networking, the company supports critical industries like VFX, rendering, and AI, providing significant performance and cost advantages. This focus on high-efficiency technology makes it an ideal workplace for professionals passionate about cutting-edge cloud solutions and impactful computing enhancements.

Company Stage

Series B

Total Funding

$3.3B

Headquarters

New York, New York

Founded

2017

Growth & Insights
Headcount

6 month growth

64%

1 year growth

228%

2 year growth

724%