xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity.
This is a full-time, onsite role based at our Memphis data center, where you will work alongside our Site Operations (SiteOps) engineers.
xAI is looking for a skilled Server Deployment Engineer with a strong foundation in data center server hardware and system software. We seek someone eager to expand their expertise in server deployment and optimization within a high-performance environment. As a Server Deployment Engineer at xAI, you will play a critical role in validating, testing, integrating, and provisioning server hardware in our data center. Collaborating closely with internal system software teams and external vendors, you will ensure server quality, system health, and the efficiency of server intake and provisioning processes. You are detail-oriented, quick to learn, and excel at executing tasks with precision. Your deep understanding of the equipment allows you to identify and drive efficiencies, optimizing server intake and repair processes to enhance overall performance.
Responsibilities
- Server Testing and Integration: Execute comprehensive server testing, integration, and provisioning within xAI data centers to ensure seamless deployment and operation of high-performance computing environments.
- System Diagnostics and Remediation: Diagnose and troubleshoot system faults in collaboration with vendors, implementing effective solutions to maintain optimal system performance.
- Hardware Management: Maintain a high throughput of compute and storage hardware intake, ensuring efficient processing, deployment, and integration of new hardware components.
- Automation and Tool Development: Develop, optimize, and maintain scripts and tools to automate processes, enhance system monitoring, and improve overall data center operations.
- Vendor and Team Collaboration: Lead and facilitate technical discussions with external vendors and internal teams to ensure alignment on system requirements, performance standards, and issue resolutions.
Basic Qualifications
- High school diploma or equivalency certificate
- 3+ years of hands-on experience working with server, storage, compute, and network hardware, including troubleshooting, maintenance, and repair of servers and networking infrastructure
Preferred Skills and Experience
- Technical Expertise in Linux/Unix: Extensive experience in Linux/Unix environments, with deep knowledge of various Linux distributions, either as a system administrator or developer, including familiarity with Linux boot processes and core system engineering principles.
- Scripting and Automation Skills: Proficiency in scripting languages such as Python, Bash, or other relevant tools, with the ability to develop scripts for automation, monitoring, and system optimization.
- Networking and Distributed Systems: Strong understanding of Ethernet networking at scale, including experience with distributed systems and network configuration in complex environments.
- Advanced Troubleshooting Skills: Demonstrated ability to diagnose complex hardware and software issues, apply systematic problem-solving approaches, and implement effective resolutions.
- Strong Communication and Collaboration: Excellent communication and interpersonal skills with the ability to work effectively with cross-functional teams and external vendors.
- Adaptability and Commitment: Highly motivated, with a strong commitment to working in a fast-paced and dynamic environment, demonstrating a proactive and hands-on approach to challenges.
Additional Requirements
- Ability to work for extended periods when necessary, including tasks that require standing or moving hardware components.
- Willingness to work evenings, weekends, or extended hours as needed to support critical operations and meet project deadlines.
- Must comply with pre-employment and ongoing random drug and alcohol testing, in accordance with company policies.
- Comfortable working in an environment requiring exposure to noise
Why Join Us?
Join a pioneering team at the forefront of AI and data center innovation, where your work will directly impact the development of next-generation technologies. Thrive in a fast-paced, dynamic workplace that encourages creativity, continuous learning, and personal development, offering ample opportunities to advance your skills and career. Work alongside top experts and thought leaders in the industry, collaborating on cutting-edge technologies that are redefining the landscape of AI, data centers, and high-performance computing.