Full-Time

Staff AI Infrastructure Engineer

Confirmed live in the last 24 hours

XPeng Motors

XPeng Motors

1,001-5,000 employees

Designs and manufactures intelligent electric vehicles and aircrafts

Data & Analytics
Robotics & Automation
Hardware
Consumer Software
AI & Machine Learning

Compensation Overview

$180k - $300kAnnually

+ Bonus + Equity

Senior, Expert

Santa Clara, CA, USA

Category
DevOps & Infrastructure
Cloud Engineering
Required Skills
Kubernetes
Microsoft Azure
Python
Docker
AWS
Go
Prometheus
Terraform
Ansible
C/C++
Data Analysis
Requirements
  • Bachelor's degree in Computer Science, Engineering, or related technical field
  • 5-8+ years of experience in software engineering, with a strong background in developing and managing large-scale distributed systems, ideally within the AI/ML infrastructure domain
  • Proficiency in programming languages such as Python, Go, or C++, with knowledge of cloud computing platforms like AWS, Azure, etc.
  • Strong communication and collaboration abilities, effective in working with diverse teams and individuals
  • In-depth understanding of AI/ML workflows, including model training, data processing, and inference pipelines
  • Practical experience with containerization technologies (i.e., Docker, Kubernetes), automation tools (i.e., Ansible, Terraform), and monitoring solutions (i.e., Prometheus, Grafana)
  • Exceptional problem-solving skills, capable of analyzing complex systems, identifying bottlenecks, and implementing scalable solutions
  • A passion for continuous learning and staying abreast of new technologies and best practices in the AI/ML infrastructure space
Responsibilities
  • Identify and resolve infrastructure gaps to ensure reliable, efficient, and scalable solutions
  • Develop advanced AI/ML infrastructure solutions that enhance the efficiency of our skilled ML teams
  • Design and implement solutions for critical areas, including distributed storage systems, scheduling systems, high availability capabilities, and core reliability issues within our large-scale GPU clusters
  • Monitor and optimize the performance of our AI/ML infrastructure, ensuring high availability, scalability, and efficient resource utilization
  • Develop and deploy automation tools, monitoring solutions, and operational strategies to streamline infrastructure management and reduce manual tasks
  • Work with various teams, including ML developers, data engineers, and DevOps professionals, to create a cohesive and integrated AI/ML infrastructure ecosystem

XPENG stands out as a leader in the tech industry, with its focus on intelligent mobility solutions such as electric vehicles and eVTOL aircraft, demonstrating a competitive edge in the rapidly evolving transportation sector. The company's proprietary Advanced Driver Assistance System (XPILOT) and intelligent operating system (Xmart OS) enhance the user experience by integrating technology and mobility, positioning XPENG as a pioneer in smart, people-first mobility. The company's culture fosters technological advancement, making it an exciting workplace for those passionate about shaping the future of transportation.

Company Stage

N/A

Total Funding

$8.2B

Headquarters

Guang Zhou Shi, China

Founded

2014