Full-Time
Machine Learning Applications Engineer
Posted on 4/18/2024
Develops servers for transformer inference technology
Hardware
Junior
Cupertino, CA, USA
Required Skills
Python
Requirements
- Deeply creative and able to think from first principles
- Good understanding of LLM architecture and how to use them to build applications
- 1+ year(s) of work experience at a cloud provider, AI company, or LLM startup
- Experience writing performant real-time code AND proficient in Python
- Breadth of knowledge about current research on large language models
Responsibilities
- Provide input for engineers designing our integrations with current transformer-specific inference libraries, like TensorRT-LLM, TransformerEngine, Hugging Face TGI, and vLLM.
- Help profile and understand where latency comes from in modern LLM serving stacks
- Help customers create products that leverage the unique capabilities of model-specific silicon
This company is an excellent workplace for those passionate about cutting-edge server technology, specifically in the realm of transformer inference. By specializing in powerful servers that integrate advanced transformer architecture into their chips, the company leads in delivering high-performance computing solutions. This focus not only drives industry standards but also fosters a culture of technical excellence and continuous innovation among its team.
Company Stage
Seed
Total Funding
$5.4M
Headquarters
Cupertino, California
Founded
2022
Growth & Insights
Headcount