Full-Time

HPC Network Engineer

Infiniband

Confirmed live in the last 24 hours

CoreWeave

CoreWeave

501-1,000 employees

Cloud provider for GPU-accelerated workloads

Data & Analytics
Enterprise Software
AI & Machine Learning

Compensation Overview

$160k - $210kAnnually

Mid

Livingston, NJ, USA + 3 more

More locations: New York, NY, USA | Bellevue, WA, USA | Sunnyvale, CA, USA

Hybrid workplace; requires in-office presence.

Category
IT Support
Network Administration
IT & Security
Required Skills
Linux/Unix
Requirements
  • Proficient in InfiniBand configuration and management.
  • Solid understanding of network architectures, topologies, best practices, and techniques for high performance and availability.
  • Familiarity with optical networking hardware.
  • Experience in Linux system administration.
  • Proficiency in at least one scripting language.
  • Team player with effective collaboration skills.
  • Ability to manage multiple tasks and projects concurrently.
Responsibilities
  • Consistently monitoring the performance and overall health of InfiniBand fabrics, which includes network switches, host adapters, and nodes. This responsibility entails utilizing existing monitoring tools and potentially developing new ones to ensure comprehensive visibility and timely detection of any issues or abnormalities.
  • Skillfully investigating and resolving various issues that may arise within InfiniBand fabrics. This involves diagnosing network connectivity problems, identifying and resolving performance bottlenecks, and effectively addressing any errors or failures within the fabric components.
  • Provide assistance and collaboration to other teams involved in the management and operation of HPC clusters utilizing InfiniBand technology. This includes offering expertise, guidance, and troubleshooting support to ensure the smooth functioning and optimal performance of the clusters.
  • Help with installation of large fabrics, organizing and work with teams to bring up fabrics from day 0 to operational fabrics together with onsite personnel and customers.
  • Work with configuration tooling, operations teams to carry out maintenance and upgrades of switches and the control plane of the fabrics.

CoreWeave provides cloud computing services that focus on GPU-accelerated workloads, which are essential for tasks requiring high computational power like Generative AI, Machine Learning, and VFX rendering. Their services allow clients to access powerful computing resources without needing to invest in expensive hardware, operating on a pay-as-you-go model. This flexibility is particularly beneficial for tech companies, film studios, and enterprises that need scalable solutions. CoreWeave's infrastructure is built on a bare metal serverless Kubernetes platform, which enhances performance while minimizing operational burdens for clients. By offering a variety of NVIDIA GPUs, they enable clients to optimize performance and costs based on their specific needs. The goal of CoreWeave is to provide efficient and scalable cloud computing resources tailored to industries that demand high-performance computing.

Company Stage

N/A

Total Funding

$2.3B

Headquarters

New York City, New York

Founded

2017

Growth & Insights
Headcount

6 month growth

53%

1 year growth

174%

2 year growth

842%
Simplify Jobs

Simplify's Take

What believers are saying

  • Securing $1.1 billion in funding positions CoreWeave for aggressive growth and innovation in the AI and HPC sectors.
  • The appointment of former AWS executive Chetan Kapoor as Chief Product Officer brings valuable expertise and leadership to drive product strategy during a hypergrowth phase.
  • CoreWeave's $2.2 billion investment in European data centers demonstrates their commitment to expanding global reach and meeting surging demand for AI infrastructure.

What critics are saying

  • The competitive landscape with giants like AWS launching high-core instances could pressure CoreWeave to continuously innovate to maintain its edge.
  • Rapid expansion, including significant investments in new data centers, could strain resources and operational capabilities.

What makes CoreWeave unique

  • CoreWeave specializes in GPU-accelerated workloads, setting it apart from general cloud service providers like AWS and Azure.
  • Their fully managed, bare metal serverless Kubernetes platform offers high performance with reduced operational burden, a unique selling point in the cloud computing market.
  • CoreWeave's strategic partnerships, such as with Bloom Energy for on-site power generation, enhance their infrastructure's reliability and sustainability.

Help us improve and share your feedback! Did you find this helpful?