Full-Time

Sr. Platform Engineer-GenAI

Confirmed live in the last 24 hours

KLA

KLA

5,001-10,000 employees

Provides process control and yield management solutions

Industrial & Manufacturing
Energy

Compensation Overview

$103k - $175.1kAnnually

+ Performance Incentive Programs

Senior

Ann Arbor, MI, USA

Category
DevOps & Infrastructure
Platform Engineering
Required Skills
Kubernetes
Development Operations (DevOps)
Requirements
  • Bachelor's Degree or equivalent training/certifications in Computer Science or related IT field
  • Eight (8) years of implementing and maintaining AI/ML Infrastructure On-Prem environment
  • Strong experience with AI/ML infrastructure and tools, including GPU clusters and Kubernetes
  • Proficiency in deploying and managing open-source GenAI components and vector databases
  • Hands-on experience with high-performance computing (HPC) environments
  • Expertise in designing and managing on-premises, cloud, and hybrid-based ML platforms
  • Solid understanding of distributed storage systems, scheduling systems, and high availability capabilities
Responsibilities
  • Identify and resolve infrastructure gaps to ensure reliable, efficient, and scalable solutions
  • Develop advanced AI/ML infrastructure solutions that enhance the efficiency of our skilled ML teams
  • Design and implement solutions for critical areas, including distributed storage systems, scheduling systems, high availability capabilities, and core reliability issues within our large-scale GPU clusters
  • Monitor and optimize the performance of our AI/ML infrastructure, ensuring high availability, scalability, and efficient resource utilization
  • Develop and deploy automation tools, monitoring solutions, and operational strategies to streamline infrastructure management and reduce manual tasks
  • Work with various teams, including ML developers, data engineers, and DevOps professionals, to create a cohesive and integrated AI/ML infrastructure ecosystem
  • Implement and manage GPU infrastructure within Kubernetes clusters to support high-performance computing and AI/ML tasks
  • Deploy and manage open-source GenAI components, such as vector databases and various AI/ML models, ensuring seamless integration and optimal performance
  • Evaluate and integrate new open-source GenAI tools and technologies to enhance the platform’s capabilities
  • Collaborate with the research and development teams to implement and optimize innovative AI/ML models and algorithms
  • Ensure the security and compliance of open-source GenAI components within the infrastructure
  • Leverage High-Performance Computing (HPC) experience to optimize and manage large-scale AI/ML workloads
  • Design, implement, and manage on-premises, cloud, and hybrid-based ML platforms to support diverse AI/ML workloads and ensure flexibility and scalability

KLA provides process control and yield management solutions primarily for semiconductor manufacturers. The company offers advanced inspection tools, metrology systems, and computational analytics that help manufacturers identify and fix defects during production. This process enhances the quality and reliability of electronic devices, leading to higher production yields. KLA distinguishes itself from competitors by focusing on high-precision equipment and software that are essential for defect detection in semiconductor manufacturing. The company's goal is to improve manufacturing processes while committing to sustainability, with a target of using 100% renewable electricity in its operations by 2030.

Company Stage

IPO

Total Funding

N/A

Headquarters

Milpitas, California

Founded

N/A

Simplify Jobs

Simplify's Take

What believers are saying

  • KLA's consistent financial performance, including strong revenue and cash flow, indicates robust financial health and stability.
  • The company's validated science-based targets for GHG emissions reduction highlight its leadership in sustainability, potentially attracting environmentally-conscious talent and investors.
  • Regular cash dividends reflect a commitment to returning value to shareholders, which can be appealing to employees holding stock options.

What critics are saying

  • The semiconductor industry is highly competitive and cyclical, which can lead to periods of volatility and uncertainty for employees.
  • Achieving ambitious sustainability goals, such as 100% renewable electricity by 2030, may present operational and logistical challenges.

What makes KLA unique

  • KLA's focus on advanced process control and process-enabling solutions for the semiconductor industry sets it apart from competitors who may not specialize as deeply in this niche.
  • The company's commitment to reducing GHG emissions and achieving 100% renewable electricity by 2030 demonstrates a strong focus on sustainability, which is increasingly important in the tech industry.
  • KLA's extensive collaboration with leading customers and its expert teams of physicists, engineers, and data scientists provide a unique competitive edge in innovation and problem-solving.

Help us improve and share your feedback! Did you find this helpful?