Simplify Logo

Full-Time

Member of Technical Staff: Machine Learning Infrastructure Engineer

Posted on 7/18/2024

essential AI

essential AI

11-50 employees

AI and machine learning model development

AI & Machine Learning

Senior, Expert

San Francisco, CA, USA

Category
Applied Machine Learning
AI & Machine Learning
DevOps & Infrastructure
Cloud Engineering
DevOps Engineering
Software Engineering
Required Skills
Kubernetes
Docker
AWS
Google Cloud Platform
Requirements
  • 6+ years of relevant industry experience in leading the design of large-scale & production ML infra systems
  • Strong understanding of architectures of new AI accelerators like TPU, IPU, HPU etc
  • Knowledge of parallel computing concepts and distributed systems
  • Experience with training and building large language models using frameworks such as Megatron, DeepSpeed, etc
  • Experience with MLPerf or internal production workloads
  • Experience with INT8/FP8 training and inference, quantization and/or distillation
  • Knowledge of container technologies like Docker and Kubernetes and cloud platforms like AWS, GCP, etc
  • Intermediate fluency with network fundamentals like VPC, Subnets, Routing Tables, Firewalls etc
Responsibilities
  • Design, build, and maintain scalable machine learning infrastructure to support model training, inference, and applications
  • Design and implement scalable machine learning and distributed systems for training and scaling of LLMs
  • Develop tools and frameworks to automate and streamline ML experimentation and management
  • Collaborate with other researchers and product engineers to enhance product experiences through large language models
  • Optimize performance and efficiency across different accelerators

This company excels in developing machine learning and artificial intelligence models tailored for everyday applications, making critical technology accessible in multiple languages. It stands out for its commitment to enhancing language translation capabilities, thereby promoting inclusiveness and accessibility in global communications. Working here offers a chance to be at the forefront of AI technology while contributing to solutions that bridge language barriers worldwide.

Company Stage

Series A

Total Funding

$64.5M

Headquarters

San Francisco, California

Founded

2023

Growth & Insights
Headcount

6 month growth

80%

1 year growth

80%

2 year growth

80%
INACTIVE