Full-Time

Senior Hardware Reliability Engineer

Gpu & Pcie

Confirmed live in the last 24 hours

CoreWeave

CoreWeave

501-1,000 employees

Cloud service for GPU-accelerated workloads

Compensation Overview

$160k - $220k/yr

Senior, Expert

No H1B Sponsorship

Livingston, NJ, USA + 3 more

More locations: New York, NY, USA | Bellevue, WA, USA | Sunnyvale, CA, USA

Hybrid work environment; remote work may be considered for candidates located more than 30 miles from an office. New hires will attend onboarding at one of the hubs within their first month.

Category
DevOps & Infrastructure
Platform Engineering
Server Administration
Cloud Engineering
Required Skills
Python
Ansible
Requirements
  • Prior experience supporting and troubleshooting data center class GPUs (preferably A100 or newer)
  • Proficiency in ansible/python and experience with programmatically interacting with server BMCs, using IPMI or Redfish (preferably Redfish)
  • Experience using, integrating and automating data center class GPU diagnostics and troubleshooting tools
  • In-depth knowledge of server hardware, components, and management technologies, particularly GPUs and PCIe devices
  • Proven ability to stay updated with the latest industry technologies and trends
  • Previous experience collaborating with hardware vendors
  • Strong passion for automation, with a commitment to automating processes comprehensively
  • Excellent documentation skills and attention to detail
  • Strong analytical and problem-solving abilities
  • Applicants must have work authorization that does not require sponsorship from the company now or in the future.
Responsibilities
  • Troubleshoot complex GPU and PCIe related failures
  • Partner with external vendors on failure analysis
  • Track component RMAs
  • Develop and maintain hardware/firmware management services
  • Automate all aspects of the server hardware lifecycle
  • Serve as the senior point of contact for hardware escalation and troubleshooting
  • Collaborate with cross-functional teams to define hardware requirements, specifications, and system architecture
  • Create and maintain accurate documentation of hardware designs, specifications, test procedures, and results
  • Analyze and optimize the performance of hardware systems, identify bottlenecks, and propose improvements for enhanced efficiency
  • Establish processes for internal hardware testing, deployment, and performance optimization.

CoreWeave provides cloud computing services that focus on GPU-accelerated workloads, which are essential for tasks requiring high computational power like Generative AI, Machine Learning, and Visual Effects rendering. Their services allow clients to access powerful computing resources without needing to invest in expensive hardware, as they operate on a pay-as-you-go basis. CoreWeave's infrastructure is built on a bare metal serverless Kubernetes platform, which enhances performance while minimizing operational burdens for clients. They cater to a variety of industries, including tech companies, film studios, and enterprises, by offering a range of NVIDIA GPUs to optimize performance and cost. The company's goal is to provide flexible and scalable computing solutions that meet the growing demands of the cloud computing market.

Company Size

501-1,000

Company Stage

IPO

Headquarters

New York City, New York

Founded

2017

Simplify Jobs

Simplify's Take

What believers are saying

  • CoreWeave expanded its credit facility to $1.5 billion, supporting growth and scalability.
  • Partnership with Galaxy enhances CoreWeave's AI and HPC infrastructure at Helios Data Center.
  • Nvidia's strategy to partner beyond hyperscalers opens new opportunities for CoreWeave.

What critics are saying

  • CoreWeave's IPO saw shares fall over 5% after missing earnings estimates.
  • Nscale's $2.7 billion expansion could threaten CoreWeave's market share.
  • Volatility in the cryptocurrency market may impact CoreWeave's financial stability.

What makes CoreWeave unique

  • CoreWeave specializes in GPU-accelerated workloads, optimizing for AI and VFX rendering.
  • Their infrastructure uses bare metal serverless Kubernetes, enhancing performance and reducing DevOps burden.
  • CoreWeave offers a flexible pay-as-you-go model, appealing to cost-sensitive businesses.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Life Insurance

Disability Insurance

Health Savings Account/Flexible Spending Account

Tuition Reimbursement

Mental Health Support

Family Planning Benefits

Paid Parental Leave

Hybrid Work Options

401(k) Company Match

Unlimited Paid Time Off

Catered lunch each day in our office and data center locations

A casual work environment

Growth & Insights and Company News

Headcount

6 month growth

-1%

1 year growth

2%

2 year growth

5%
PYMNTS
May 18th, 2025
Nvidia Reportedly Aiming To Expand Ai Business Beyond ‘Hyperscalers’

Chipmaker Nvidia is reportedly working to lessen its dependency on Big Tech. The company is doing this by forging new partnerships to sell artificial intelligence (AI) to national governments, corporations and challengers to companies like Google, Amazon and Microsoft, the Financial Times (FT) reported Sunday (May 18). The report came days after Nvidia announced a multibillion-dollar U.S. chip deal with Saudi Arabia’s Humain, while the United Arab Emirates announced plans to build one of the world’s largest data centers in partnership with the American government, as the Gulf states work to construct massive AI infrastructure

CNBC
May 15th, 2025
Coatue's $534M Stake in CoreWeave IPO

Philippe Laffont's Coatue Management acquired a $534 million stake in AI infrastructure provider CoreWeave during its March IPO, the largest U.S. tech IPO since 2021. CoreWeave, backed by Nvidia, reported better-than-expected revenue and projected faster growth for the year. Coatue's portfolio includes major AI-related stocks like Meta, Amazon, Microsoft, and Nvidia, and it also invested in Taiwan Semiconductor, Carvana, Skyworks Solutions, Pinterest, Tempus AI, and Astera Labs.

World of Software
May 14th, 2025
CoreWeave shares fall after missed earnings estimates in first report since Nasdaq debut - News

Shares in CoreWeave Inc. fell more than 5% in late trading today after the cloud artificial intelligence infrastructure provider fell very short of expectations on earnings but reported a beat on revenue in its fiscal first quarter, the company's first report since it went public in March.

VentureBeat
May 9th, 2025
Openai, Microsoft Tell Senate ‘No One Country Can Win Ai’

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn MoreThe Trump administration walked back an Executive Order from former President Joe Biden that created rules around the development and deployment of AI. Since then, the government has stepped back from regulating the technology.In a more than three-hour hearing at the Senate Committee on Commerce, Science and Transportation, executives like OpenAI CEO Sam Altman, AMD CEO Lisa Su, Coreweave co-founder and CEO Michael Intrator and Microsoft Vice Chair and President Brad Smith urged policymakers to ease the process of building infrastructure around AI development.The executives told policymakers that speeding up permitting could make building new data centers, power plants to energize data centers and even chip fabricators crucial in shoring up the AI Tech Stack and keeping the country competitive against China. They also spoke about the need for more skilled workers like electricians, easing software talent immigration and encouraging “AI diffusion” or the adoption of generative AI models in the U.S. and worldwide.Altman, fresh from visiting the company’s $500 billion Stargate project in Texas, told senators that the U.S. is leading the charge in AI, but it needs more infrastructure like power plants to fuel its next phase.“I believe the next decade will be about abundant intelligence and abundant energy

PYMNTS
May 8th, 2025
Tech Leaders Urge Congress For ‘Light-Touch’ Ai Regulations

Top executives from OpenAI, Microsoft, AMD and CoreWeave urged lawmakers at a U.S. Senate hearing Thursday (May 8) to support the nation’s artificial intelligence efforts through “light-touch” regulations. “The stakes could not be higher — and Congress is right that the United States must lead the way,” OpenAI CEO Sam Altman said before the Senate [] The post Tech Leaders Urge Congress for ‘Light-Touch’ AI Regulations appeared first on PYMNTS.com.