Full-Time

Cloud Machine Learning Engineer

Posted on 11/12/2024

Hugging Face

Hugging Face

501-1,000 employees

Open-source tools for machine learning community

No salary listed

Paris, France

Remote

Remote position; candidates can be based in France.

Category
🤖AI & Machine Learning (1)
Required Skills
Kubernetes
Rust
Microsoft Azure
Pytorch
Docker
TypeScript
AWS
MongoDB
Google Cloud Platform
Requirements
  • Deep experience building with Hugging Face Technologies, including Transformers, Diffusers, Accelerate, PEFT, Datasets
  • Expertise in Deep Learning Framework, preferably PyTorch, optionally XLA understanding
  • Strong knowledge of cloud platforms like AWS and services like Amazon SageMaker, EC2, S3, CloudWatch and/or Azure and GCP equivalents
  • Experience in building MLOps pipelines for containerizing models and solutions with Docker
  • Familiarity with Typescript, Rust, and MongoDB, Kubernetes are helpful
  • Ability to write clear documentation, examples and definition and work across the full product development lifecycle
Responsibilities
  • Bridging and integrating 🤗 transformers/diffusers models with a different Cloud provider.
  • Ensuring the above models meet the expected performance
  • Designing & Developing easy-to-use, secure, and robust Developer Experiences & APIs for our users.
  • Write technical documentation, examples and notebooks to demonstrate new features
  • Sharing & Advocating your work and the results with the community.
Desired Qualifications
  • Bonus: Experience with Svelte & TailwindCSS

Hugging Face focuses on providing tools and resources for the machine learning community, particularly in the area of natural language processing (NLP). Their main product, the Transformers library, is an open-source collection of pretrained models that can be used for various NLP tasks such as text classification, information extraction, question answering, and text generation. This library has become a widely used resource among developers and researchers in the field. Additionally, Hugging Face operates the Hugging Face Hub, a platform that allows users to share models, datasets, and demos, promoting collaboration within the community. Unlike many competitors, Hugging Face maintains a business model that supports enterprise-level services while keeping their core tools free for public use. The company's goal is to foster an open and collaborative environment for machine learning development.

Company Size

501-1,000

Company Stage

Series D

Total Funding

$395.7M

Headquarters

New York City, New York

Founded

2016

Simplify Jobs

Simplify's Take

What believers are saying

  • Integration with Groq enhances Hugging Face's platform with high-performance AI models.
  • MiniMax-M1 on Hugging Face offers expansive context window for long-context reasoning tasks.
  • Affordable open-source robots could expand Hugging Face's market in home AI exploration.

What critics are saying

  • MiniMax-M1 challenges Hugging Face's dominance in open-source AI model space.
  • Groq's integration may increase competition from AWS and Google in AI inference.
  • Mistral's Magistral models could attract developers away from Hugging Face's offerings.

What makes Hugging Face unique

  • Hugging Face's Transformers library is a standard tool for NLP developers.
  • The Hugging Face Hub fosters an open and collaborative machine learning ecosystem.
  • Their business model monetizes open-source tools through enterprise support and private hosting.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Flexible Work Environment

Health Insurance

Unlimited PTO

Equity

Growth, Training, & Conferences

Generous Parental Leave

Growth & Insights and Company News

Headcount

6 month growth

1%

1 year growth

3%

2 year growth

0%
TechCrunch
Jul 11th, 2025
Hugging Face’s new robot is the Seinfeld of AI devices

Hugging Face’s new programmable Reachy Mini bots launched this week. The AI robots are open source, Raspberry Pi-powered, and come with cartoonish antennae and big googly eyes. They don’t do much out of the box. And that’s kind of the point.Today, on TechCrunch’s Equity podcast, hosts Kirsten Korosec, Max Zeff, and Anthony Ha dig into the launch of Reachy Mini, which pulled in a surprising $500,000 in sales in its first 24 hours. As open source companies like Hugging Face explore physical products, Kirsten and Max agree that Reachy Mini might be the Seinfeld of AI hardware: the bots might do nothing in particular, but they’re still captivating. Listen to the full episode to hear more news from the week, including:Equity will be back next week, so stay tuned!Equity is TechCrunch’s flagship podcast, produced by Theresa Loconsolo, and posts every Wednesday and Friday. Subscribe to us on Apple Podcasts, Overcast, Spotify and all the casts. You also can follow Equity on X and Threads, at @EquityPod

PR Newswire
Jun 17th, 2025
Apto Releases High-Accuracy Japanese Reasoning Data For Llm Fine-Tuning, Free Of Charge

TOKYO, June 17, 2025 /PRNewswire/ -- APTO is pleased to announce the release of a free dataset for fine-tuning reasoning models, such as OpenAI's GPT-01 and Deepseek's Deepseek R1.This dataset can help to improve reasoning ability in Japanese and reduce redundant inference

VentureBeat
Jun 16th, 2025
Minimax-M1 Is A New Open Source Model With 1 Million Token Context And New, Hyper Efficient Reinforcement Learning

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more. Chinese AI startup MiniMax, perhaps best known in the West for its hit realistic AI video model Hailuo, has released its latest large language model, MiniMax-M1 — and in great news for enterprises and developers, it’s completely open source under an Apache 2.0 license, meaning businesses can take it and use it for commercial applications and modify it to their liking without restriction or payment. M1 is an open-weight offering that sets new standards in long-context reasoning, agentic tool use, and efficient compute performance. It’s available today on the AI code sharing community Hugging Face and Microsoft’s rival code sharing community GitHub, the first release of what the company dubbed as “MiniMaxWeek” from its social account on X — with further product announcements expected. MiniMax-M1 distinguishes itself with a context window of 1 million input tokens and up to 80,000 tokens in output, positioning it as one of the most expansive models available for long-context reasoning tasks.The “context window” in large language models (LLMs) refers to the maximum number of tokens the model can process at one time — including both input and output

VentureBeat
Jun 16th, 2025
Groq Just Made Hugging Face Way Faster — And It’S Coming For Aws And Google

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more. Groq, the artificial intelligence inference startup, is making an aggressive play to challenge established cloud providers like Amazon Web Services and Google with two major announcements that could reshape how developers access high-performance AI models.The company announced Monday that it now supports Alibaba’s Qwen3 32B language model with its full 131,000-token context window — a technical capability it claims no other fast inference provider can match. Simultaneously, Groq became an official inference provider on Hugging Face’s platform, potentially exposing its technology to millions of developers worldwide.The move is Groq’s boldest attempt yet to carve out market share in the rapidly expanding AI inference market, where companies like AWS Bedrock, Google Vertex AI, and Microsoft Azure have dominated by offering convenient access to leading language models.“The Hugging Face integration extends the Groq ecosystem providing developers choice and further reduces barriers to entry in adopting Groq’s fast and efficient AI inference,” a Groq spokesperson told VentureBeat. “Groq is the only inference provider to enable the full 131K context window, allowing developers to build applications at scale.”How Groq’s 131k context window claims stack up against AI inference competitorsGroq’s assertion about context windows — the amount of text an AI model can process at once — strikes at a core limitation that has plagued practical AI applications. Most inference providers struggle to maintain speed and cost-effectiveness when handling large context windows, which are essential for tasks like analyzing entire documents or maintaining long conversations.Independent benchmarking firm Artificial Analysis measured Groq’s Qwen3 32B deployment running at approximately 535 tokens per second, a speed that would allow real-time processing of lengthy documents or complex reasoning tasks

Fox News
Jun 16th, 2025
New robots make AI something anyone can try at home

Tech expert Kurt Knutsson says Hugging Face has launched affordable open-source humanoid robots for home AI exploration.

INACTIVE