Full-Time

Engineer – Fleet Monitoring & Analysis

Updated on 12/20/2024

CoreWeave

CoreWeave

501-1,000 employees

Cloud service for GPU-accelerated workloads

Enterprise Software
AI & Machine Learning

Compensation Overview

$160k - $185kAnnually

Junior, Mid

Livingston, NJ, USA + 3 more

More locations: New York, NY, USA | Bellevue, WA, USA | Sunnyvale, CA, USA

Hybrid workplace; in-office presence required.

Category
DevOps & Infrastructure
Site Reliability Engineering
DevOps Engineering
Requirements
  • 2 or more years experience in a software or infrastructure engineering industry.
  • Experience in the domains of automation and orchestration workflows.
  • Knowledgeable about server hardware, components, and related technologies and strategies for the management of physical infrastructure at scale.
  • Experience implementing metrics collection and alerting on standard platforms.
  • Belief in the value of automation and championing practices that drive reliability and prioritize the CoreWeave customer experience.
Responsibilities
  • Design and implement solutions to large-scale server observability to continually improve the stability of CoreWeave’s global hardware fleet.
  • Adapt, extend, and implement open-source solutions to augment the depth and breadth of our visibility into our operating environment.
  • Generate and maintain custom reports, alarms, and visualizations to help teams understand and respond to our growth and changes.
  • Create test plans, deployment automation, dashboards, alerts, and insights into our fleet operations, as well as participate in the Fleet Engineering Developers’ on-call rotation.

CoreWeave provides cloud computing services that focus on GPU-accelerated workloads, which are essential for tasks requiring high computational power like Generative AI, Machine Learning, and VFX rendering. Their services allow clients to access powerful computing resources without needing to invest in expensive hardware, operating on a pay-as-you-go model. This flexibility is particularly beneficial for tech companies, film studios, and enterprises that need scalable solutions. CoreWeave uses a fully managed, bare metal serverless Kubernetes platform, which enhances performance while minimizing operational burdens for clients. Unlike many competitors, CoreWeave offers a wide selection of NVIDIA GPUs, enabling clients to optimize their performance and costs based on specific needs. The company's goal is to provide efficient and scalable computing resources that meet the growing demands of various industries.

Company Stage

Private

Total Funding

$1.6B

Headquarters

New York City, New York

Founded

2017

Growth & Insights
Headcount

6 month growth

35%

1 year growth

146%

2 year growth

828%
Simplify Jobs

Simplify's Take

What believers are saying

  • CoreWeave's strategic partnership with Dell enhances their AI cloud service competitiveness.
  • The planned data center in Canada with Cohere expands CoreWeave's North American presence.
  • CoreWeave's $600M funding for a Virginia data center indicates strong growth potential.

What critics are saying

  • Increased competition from AMD-backed Vultr may lead to price wars.
  • Potential delays in Virginia data center completion could impact demand fulfillment.
  • Dependence on NVIDIA technology poses risks if supply chain issues arise.

What makes CoreWeave unique

  • CoreWeave specializes in GPU-accelerated workloads, catering to AI and VFX industries.
  • Their infrastructure uses bare metal serverless Kubernetes for high performance and reduced DevOps burden.
  • CoreWeave offers a broad selection of NVIDIA GPUs, optimizing performance and cost for clients.

Help us improve and share your feedback! Did you find this helpful?