Full-Time

Lead Reliability Engineer

Posted on 5/20/2024

Celestial AI

Celestial AI

51-200 employees

Optical interconnects for high-performance computing

Compensation Overview

$175k - $200kAnnually

Expert

Santa Clara, CA, USA

Bay Area location is preferred.

Category
DevOps & Infrastructure
Site Reliability Engineering
Requirements
  • Bachelor's degree in Engineering or related field; Master's or PhD degree preferred.
  • 15+ years of experience in reliability engineering, with a focus on datacenter and high-performance computing applications at component, board and system level.
  • Very strong understanding on physics of failures to drive material and process improvements for components.
  • Strong understanding of reliability principles, methodologies, and tools relevant to datacenter and HPC environments, such as reliability modeling, fault tolerance techniques, and performance optimization strategies.
  • Experience working with industry standards and guidelines specific to datacenter and HPC reliability, such as GR-468 and other relevant datacenter component qualification requirements.
  • Proven ability to lead cross-functional teams and drive reliability initiatives in fast-paced environments.
  • Excellent problem-solving skills and the ability to perform detailed root cause analysis in complex systems.
  • Effective communication skills and the ability to collaborate with internal teams and external stakeholders in the datacenter and HPC ecosystem.
Responsibilities
  • Develop and implement reliability strategies, standards, and processes customized for datacenter and high-performance computing applications, addressing unique challenges such as thermal management, power integrity, and workload variability.
  • Lead reliability testing and qualification activities tailored for datacenter and HPC environments, including stress testing, thermal cycling, and performance degradation analysis.
  • Collaborate closely with cross-functional teams, including hardware design, systems engineering, and datacenter operations, to integrate reliability considerations into product development and deployment processes.
  • Conduct thorough reliability analyses specific to datacenter and HPC applications, such as MTBF (Mean Time Between Failures) calculations, system-level fault tolerance assessments, and risk mitigation strategies.
  • Define reliability requirements and specifications for new products targeting datacenter and HPC markets, working closely with design teams to ensure compliance with industry standards and customer expectations.
  • Lead root cause analysis and corrective actions for reliability issues identified in datacenter and HPC environments, driving continuous improvement initiatives and implementing best practices.
  • Stay abreast of emerging technologies and industry trends in datacenter and HPC reliability engineering, leveraging this knowledge to enhance the reliability and performance of our systems.

Celestial AI operates in the high-performance computing sector, focusing on hyperscale data centers. The company has created a technology called Photonic Fabric™, which is an optical compute interconnect designed to enhance memory sharing performance and reduce costs. This technology can lower the total DRAM requirements by up to 35%, leading to significant savings for data centers. By utilizing optical interconnects, Celestial AI not only decreases power consumption but also increases bandwidth capacity, which is crucial for multi-tenant cloud environments where lower latency memory pooling can save around 23%. Additionally, their technology enables the disaggregation of High Bandwidth Memory (HBM) by using optics instead of traditional PCIe connections, addressing future needs for higher bandwidth connections among compute units. Celestial AI aims to provide this optical connectivity to data centers, facilitating advancements in Generative AI and other complex computing tasks.

Company Size

51-200

Company Stage

Series C

Total Funding

$329.6M

Headquarters

Sunnyvale, California

Founded

2020

Simplify Jobs

Simplify's Take

What believers are saying

  • Acquisition of Rockley Photonics' IP strengthens Celestial AI's competitive advantage.
  • $175 million Series C funding supports rapid growth and innovation.
  • Rising demand for optical interconnects aligns with Celestial AI's offerings.

What critics are saying

  • Emerging competition from Lightmatter and Ayar Labs threatens market position.
  • Rapid tech advancements may outpace Celestial AI's current offerings.
  • Integration challenges with existing infrastructure could hinder technology adoption.

What makes Celestial AI unique

  • Celestial AI's Photonic Fabric™ offers unique optical compute interconnect technology.
  • The company reduces DRAM requirements by up to 35%, saving billions annually.
  • Celestial AI enables HBM disaggregation over optics, a future necessity for compute units.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Vision Insurance

Dental Insurance

Life Insurance

Company Equity

Growth & Insights and Company News

Headcount

6 month growth

3%

1 year growth

1%

2 year growth

10%
Photonics
Jan 7th, 2025
Celestial AI Acquires Rockley's Silicon Photonics Portfolio

Optical computing technologies developer Celestial AI has acquired silicon photonics intellectual property from Rockley Photonics.

Data Center Dynamics
Oct 24th, 2024
Celestial AI acquires Rockley Photonics patent portfolio for $20m

Founded in 2020, Santa Clara, California-based Celestial AI is developing an optical interconnect technology platform for data center and AI computing solutions.

EPT
Oct 23rd, 2024
Celestial AI acquires Rockley Photonics patent portfolio

Celestial AI acquires Rockley Photonics patent portfolio.

EE News Europe
Oct 23rd, 2024
Celestial AI buys Rockley Photonics patents for $20m

Celestial AI has bought the silicon photonics intellectual property of Rockley Photonics, including worldwide issued and pending patents, for $20m.

Business Wire
Apr 24th, 2024
Celestial AI Announces Appointment of Diane Bryant to Board of Directors

Celestial AI announces appointment of Diane Bryant to Board of Directors.

INACTIVE