Internship

Machine Learning Engineer Internship

LLM Evaluation, Remote

Posted on 1/29/2024

Hugging Face

Hugging Face

501-1,000 employees

Develops advanced NLP models for text tasks

No salary listed

Remote in USA

The company has office spaces around the world, especially in the US, Canada, and Europe, but is very distributed and offers flexible working hours and remote options. Remote employees have the opportunity to visit the offices, and if needed, the company will outfit their workstation to ensure success.

Category
AI & Machine Learning
Requirements
  • Understanding of machine learning concepts
  • Experience with evaluating machine learning models
  • Ability to work with datasets and frontend technologies
  • Strong communication and community interaction skills
Responsibilities
  • Contribute to the 'Open LLM Leaderboard' to promote its growth
  • Implement fair and interesting ways to display model results
  • Engage in community interactions and discussions
  • Contribute to future iterations of the leaderboard
Desired Qualifications
  • Experience working with large communities of ML professionals and enthusiasts

Hugging Face develops machine learning models that can understand and generate human-like text, focusing on natural language processing (NLP). Their main products include advanced models like GPT-2 and XLNet, which can perform various tasks such as text completion, translation, and summarization. Users can access these models through a web application and a repository, making it easy to integrate AI into different applications. Unlike many competitors, Hugging Face offers a freemium model, providing basic features for free while charging for advanced functionalities and enterprise solutions tailored to large organizations. The company's goal is to empower researchers, developers, and businesses to utilize AI for text-related tasks effectively.

Company Size

501-1,000

Company Stage

Series D

Total Funding

$395.7M

Headquarters

New York City, New York

Founded

2016

Simplify Jobs

Simplify's Take

What believers are saying

  • Integration with KServe enhances Hugging Face's model deployment and autoscaling.
  • Meta's Llama 4 models on Hugging Face could boost developer engagement.
  • Deep Cogito's open-source models offer collaboration opportunities for hybrid reasoning.

What critics are saying

  • OpenAI's GPT-4.1 models pose a competitive threat to Hugging Face's solutions.
  • Meta's Llama 4 models could reduce Hugging Face's NLP market share.
  • DeepCoder-14B's performance may attract developers away from Hugging Face.

What makes Hugging Face unique

  • Hugging Face offers a unique freemium model for AI and NLP services.
  • The company provides open-source NLP models like GPT-2 and XLNet.
  • Hugging Face's platform supports diverse clients, from researchers to enterprises.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Flexible Work Environment

Health Insurance

Unlimited PTO

Equity

Growth, Training, & Conferences

Generous Parental Leave

Growth & Insights and Company News

Headcount

6 month growth

3%

1 year growth

3%

2 year growth

1%
VentureBeat
Apr 14th, 2025
Openai’S New Gpt-4.1 Models Can Process A Million Tokens And Solve Coding Problems Better Than Ever

OpenAI launched a new family of AI models this morning that significantly improve coding abilities while cutting costs, responding directly to growing competition in the enterprise AI market.The San Francisco-based AI company introduced three models — GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano — all available immediately through its API. The new lineup performs better at software engineering tasks, follows instructions more precisely, and can process up to one million tokens of context, equivalent to about 750,000 words.“GPT-4.1 offers exceptional performance at a lower cost,” said Kevin Weil, chief product officer at OpenAI, during Monday’s announcement. “These models are better than GPT-4o on just about every dimension.”Perhaps most significant for enterprise customers is the pricing: GPT-4.1 will cost 26% less than its predecessor, while the lightweight nano version becomes OpenAI’s most affordable offering at just 12 cents per million tokens. VIDEO. How GPT-4.1’s improvements target enterprise developers’ biggest pain pointsIn a candid interview with VentureBeat, Michelle Pokrass, post training research lead at OpenAI, emphasized that practical business applications drove the development process.“GPT-4.1 was trained with one goal: being useful for developers,” Pokrass told VentureBeat. “We’ve found GPT-4.1 is much better at following the kinds of instructions that enterprises use in practice, which makes it much easier to deploy production-ready applications.”This focus on real-world utility is reflected in benchmark results

VentureBeat
Apr 13th, 2025
Beyond Arc-Agi: Gaia And The Search For A Real Intelligence Benchmark

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Intelligence is pervasive, yet its measurement seems subjective. At best, we approximate its measure through tests and benchmarks. Think of college entrance exams: Every year, countless students sign up, memorize test-prep tricks and sometimes walk away with perfect scores. Does a single number, say a 100%, mean those who got it share the same intelligence — or that they’ve somehow maxed out their intelligence? Of course not

Decrypt
Apr 13th, 2025
The Underground Guide To Making Ai-Generated Viral Videos

Decrypt’s Art, Fashion, and Entertainment Hub. Discover SCENEIf you are a regular social media user, then you may have stumbled upon videos like this:China AI-Generated videos depicting MAGA supporters - including Trump himself - working in warehouses sewing and manufacturing are going viral after the trade war between Beijing and Washington kicked off.pic.twitter.com/XP1jd4oyLC — The American Way With Sheila Kay (@usasheilakay) April 11, 2025Or this:Trump taking a 30 min lunch break from the factory work at the fat factory model he created that will make America great again pic.twitter.com/941SMKojGY — Utamadush 🦇 🔊 ➕ (@utamadush) April 9, 2025Or thisHilariousChina is making some real stuff though AI right now.China vs USA was heating up. MAGA#TrumpTariffs #stockmarketcrash #tariff pic.twitter.com/XYrT7tSIfZ — Anshul Garg (@AnshulGarg1986) April 11, 2025Thanks to the U.S. trade war, viral videos depicting overweight Americans toiling in fictional sweatshops have exploded across TikTok and X, racking up millions of views. These clips show exhausted workers stitching clothes in dismal factory conditions—a satirical jab at Trump's promise to bring manufacturing jobs back to America through hefty tariffs on Chinese goods.One 32-second video created by TikTok user Ben Lau has been viewed millions of times, showing AI-generated Americans in dire factory conditions, complete with traditional Chinese music and ending with a snarky "Make America Great Again" slogan.AI has become the new political cartoon, a powerful tool for political protest, and savvy users are coming up with creative ways to use their favorite models as means to spread a message.Ever wondered how they do it? It’s actually pretty easy. All you need is a powerful enough PC—or be willing to spend a few bucks/euros/pesos/pounds/yuan—to bring your ideas to life

VentureBeat
Apr 12th, 2025
Bigger Isn’T Always Better: Examining The Business Case For Multi-Million Token Llms

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. The race to expand large language models (LLMs) beyond the million-token threshold has ignited a fierce debate in the AI community. Models like MiniMax-Text-01 boast 4-million-token capacity, and Gemini 1.5 Pro can process up to 2 million tokens simultaneously. They now promise game-changing applications and can analyze entire codebases, legal contracts or research papers in a single inference call.At the core of this discussion is context length — the amount of text an AI model can process and also remember at once. A longer context window allows a machine learning (ML) model to handle much more information in a single request and reduces the need for chunking documents into sub-documents or splitting conversations

VentureBeat
Apr 10th, 2025
Deepcoder Delivers Top Coding Performance In Efficient 14B Open Model

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Researchers at Together AI and Agentica have released DeepCoder-14B, a new coding model that delivers impressive performance comparable to leading proprietary models like OpenAI’s o3-mini. Built on top of DeepSeek-R1, this model gives more flexibility to integrate high-performance code generation and reasoning capabilities into real-world applications. Importantly, the teams have fully open-sourced the model, its training data, code, logs and system optimizations, which can help researchers improve their work and accelerate progress.Competitive coding capabilities in a smaller packageThe research team’s experiments show that DeepCoder-14B performs strongly across several challenging coding benchmarks, including LiveCodeBench (LCB), Codeforces and HumanEval+.“Our model demonstrates strong performance across all coding benchmarks… comparable to the performance of o3-mini (low) and o1,” the researchers write in a blog post that describes the model.Interestingly, despite being trained primarily on coding tasks, the model shows improved mathematical reasoning, scoring 73.8% on the AIME 2024 benchmark, a 4.1% improvement over its base model (DeepSeek-R1-Distill-Qwen-14B). This suggests that the reasoning skills developed through RL on code can be generalized effectively to other domains.Credit: Together AIThe most striking aspect is achieving this level of performance with only 14 billion parameters. This makes DeepCoder significantly smaller and potentially more efficient to run than many frontier models.Innovations driving DeepCoder’s performanceWhile developing the model, the researchers solved some of the key challenges in training coding models using reinforcement learning (RL).The first challenge was curating the training data

INACTIVE