Simplify Logo

Full-Time

HPC Engineer

Machine Learning Infrastructure, US Remote

Posted on 4/6/2024

Hugging Face

Hugging Face

201-500 employees

Develops advanced NLP models for text tasks

AI & Machine Learning

Senior, Expert

Remote in USA

Category
DevOps & Infrastructure
Site Reliability Engineering
Cloud Engineering
DevOps Engineering
Required Skills
Rust
Python
Git
Data Structures & Algorithms
AWS
Go
Development Operations (DevOps)
Linux/Unix
Google Cloud Platform
Requirements
  • 7+ years of experience in a DevOps or infrastructure Engineer role building machine learning infrastructure and working with large GPU clusters
  • Knowledge of cloud providers such as AWS, GCP, infra-as-code frameworks and observability tools
  • Familiarity with Python Scientific stack, Pytorch
  • Experience with data structures, data modeling, and database management as well as object and file storage systems
  • Strong communication, collaboration, and documentation skills
  • Experience with Linux, Git, containers, networking and command line tools
  • Strong programming skills in Python, Golang, and/or Rust
Responsibilities
  • Design, develop, deploy, and maintain reliable and scalable infrastructure that enables efficient training workloads
  • Manage large compute clusters for AI Training and development
  • Create tooling and infrastructure that abstract compute and storage in ML workflows
  • Measure and optimize system performance
  • Monitor and troubleshoot infrastructure issues, ensuring high availability and performance of AI workloads
  • Stay up to date with the latest advancements in AI infrastructure technologies and recommend improvements to enhance system efficiency and performance
  • Work closely with AI software engineering teams to ensure infrastructure can handle all system requirements
  • Provide primary operational support and engineering for multiple teams

Hugging Face develops machine learning models that can understand and generate human-like text, focusing on artificial intelligence and natural language processing. Their main products include advanced models like GPT-2 and XLNet, which can perform various tasks such as text completion, translation, and summarization. Users can access these models through a web application and a repository, making it easy to integrate AI into different applications. Unlike many competitors, Hugging Face offers a freemium model, allowing users to access basic features for free while providing subscription plans for advanced functionalities. The company also tailors solutions for large organizations, including custom model training. The goal of Hugging Face is to empower researchers, developers, and enterprises to utilize sophisticated language models effectively.

Company Stage

Series D

Total Funding

$395.2M

Headquarters

New York City, New York

Founded

2016

Growth & Insights
Headcount

6 month growth

23%

1 year growth

58%

2 year growth

107%
Simplify Jobs

Simplify's Take

What believers are saying

  • Hugging Face's partnerships with companies like Sakana AI and Apple highlight its influence and integration within the AI ecosystem.
  • The release of compact language models like SmolLM demonstrates Hugging Face's commitment to democratizing AI by making powerful NLP capabilities accessible on mobile devices.
  • Recognition in industry awards and continuous innovation in AI models position Hugging Face as a leader in the AI and NLP sectors.

What critics are saying

  • The competitive landscape in AI and NLP is intense, with major players like OpenAI and Nvidia posing significant challenges.
  • Reliance on a freemium model may limit revenue growth if users do not convert to paid plans.

What makes Hugging Face unique

  • Hugging Face's focus on NLP and text generation models like GPT-2 and XLNet sets it apart from competitors who may offer more generalized AI solutions.
  • Their freemium model combined with enterprise solutions allows them to cater to a wide range of clients, from individual developers to large organizations.
  • The company's extensive repository and accessible web application make advanced AI tools available to a broader audience, enhancing user engagement and adoption.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Flexible Work Environment

Health Insurance

Unlimited PTO

Equity

Growth, Training, & Conferences

Generous Parental Leave

INACTIVE