Full-Time
Machine Learning Compiler Frontend Engineer
Posted on 4/18/2024
Develops servers for transformer inference technology
Hardware
Senior
Cupertino, CA, USA
Required Skills
Python
Requirements
- 5+ years of experience writing production-grade software.
- Able to write production-grade code in Python
- Experience with LLMs to build products
- Experience with at least one of TensorRT, TensorRT-LLM, Transformer Engine, or vLLM
- Great understanding of how companies working with LLMs build their inference stacks
- 1+ year of work experience at a cloud provider
- Deeply creative and able to think from first principles
Responsibilities
- Design and develop our integrations with current transformer-specific inference libraries, like TensorRT-LLM, TransformerEngine, Hugging Face TGI, and vLLM.
- Provide feedback to the firmware, compiler, and hardware teams based on compiler development work
- Ensure the software we expose to customers is reliable and production-grade as soon as our servers begin to ship
This company is an excellent workplace for those passionate about cutting-edge server technology, specifically in the realm of transformer inference. By specializing in powerful servers that integrate advanced transformer architecture into their chips, the company leads in delivering high-performance computing solutions. This focus not only drives industry standards but also fosters a culture of technical excellence and continuous innovation among its team.
Company Stage
Seed
Total Funding
$5.4M
Headquarters
Cupertino, California
Founded
2022
Growth & Insights
Headcount