Internship
Posted on 1/29/2024
Develops advanced NLP models for text tasks
No salary listed
Remote in USA
The company has office spaces around the world, especially in the US, Canada, and Europe, but is very distributed and offers flexible working hours and remote options. Remote employees have the opportunity to visit the offices, and if needed, the company will outfit their workstation to ensure success.
Upload your resume to see how it matches 5 keywords from the job description.
PDF, DOC, DOCX, up to 4 MB
Hugging Face develops machine learning models that can understand and generate human-like text, focusing on natural language processing (NLP). Their main products include advanced models like GPT-2 and XLNet, which can perform various tasks such as text completion, translation, and summarization. Users can access these models through a web application and a repository, making it easy to integrate AI into different applications. Unlike many competitors, Hugging Face offers a freemium model, providing basic features for free while charging for advanced functionalities and enterprise solutions tailored to large organizations. The company's goal is to empower researchers, developers, and businesses to utilize AI for text-related tasks effectively.
Company Size
501-1,000
Company Stage
Series D
Total Funding
$395.7M
Headquarters
New York City, New York
Founded
2016
Help us improve and share your feedback! Did you find this helpful?
Flexible Work Environment
Health Insurance
Unlimited PTO
Equity
Growth, Training, & Conferences
Generous Parental Leave
OpenAI launched a new family of AI models this morning that significantly improve coding abilities while cutting costs, responding directly to growing competition in the enterprise AI market.The San Francisco-based AI company introduced three models — GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano — all available immediately through its API. The new lineup performs better at software engineering tasks, follows instructions more precisely, and can process up to one million tokens of context, equivalent to about 750,000 words.“GPT-4.1 offers exceptional performance at a lower cost,” said Kevin Weil, chief product officer at OpenAI, during Monday’s announcement. “These models are better than GPT-4o on just about every dimension.”Perhaps most significant for enterprise customers is the pricing: GPT-4.1 will cost 26% less than its predecessor, while the lightweight nano version becomes OpenAI’s most affordable offering at just 12 cents per million tokens. VIDEO. How GPT-4.1’s improvements target enterprise developers’ biggest pain pointsIn a candid interview with VentureBeat, Michelle Pokrass, post training research lead at OpenAI, emphasized that practical business applications drove the development process.“GPT-4.1 was trained with one goal: being useful for developers,” Pokrass told VentureBeat. “We’ve found GPT-4.1 is much better at following the kinds of instructions that enterprises use in practice, which makes it much easier to deploy production-ready applications.”This focus on real-world utility is reflected in benchmark results
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Intelligence is pervasive, yet its measurement seems subjective. At best, we approximate its measure through tests and benchmarks. Think of college entrance exams: Every year, countless students sign up, memorize test-prep tricks and sometimes walk away with perfect scores. Does a single number, say a 100%, mean those who got it share the same intelligence — or that they’ve somehow maxed out their intelligence? Of course not
Decrypt’s Art, Fashion, and Entertainment Hub. Discover SCENEIf you are a regular social media user, then you may have stumbled upon videos like this:China AI-Generated videos depicting MAGA supporters - including Trump himself - working in warehouses sewing and manufacturing are going viral after the trade war between Beijing and Washington kicked off.pic.twitter.com/XP1jd4oyLC — The American Way With Sheila Kay (@usasheilakay) April 11, 2025Or this:Trump taking a 30 min lunch break from the factory work at the fat factory model he created that will make America great again pic.twitter.com/941SMKojGY — Utamadush 🦇 🔊 ➕ (@utamadush) April 9, 2025Or thisHilariousChina is making some real stuff though AI right now.China vs USA was heating up. MAGA#TrumpTariffs #stockmarketcrash #tariff pic.twitter.com/XYrT7tSIfZ — Anshul Garg (@AnshulGarg1986) April 11, 2025Thanks to the U.S. trade war, viral videos depicting overweight Americans toiling in fictional sweatshops have exploded across TikTok and X, racking up millions of views. These clips show exhausted workers stitching clothes in dismal factory conditions—a satirical jab at Trump's promise to bring manufacturing jobs back to America through hefty tariffs on Chinese goods.One 32-second video created by TikTok user Ben Lau has been viewed millions of times, showing AI-generated Americans in dire factory conditions, complete with traditional Chinese music and ending with a snarky "Make America Great Again" slogan.AI has become the new political cartoon, a powerful tool for political protest, and savvy users are coming up with creative ways to use their favorite models as means to spread a message.Ever wondered how they do it? It’s actually pretty easy. All you need is a powerful enough PC—or be willing to spend a few bucks/euros/pesos/pounds/yuan—to bring your ideas to life
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. The race to expand large language models (LLMs) beyond the million-token threshold has ignited a fierce debate in the AI community. Models like MiniMax-Text-01 boast 4-million-token capacity, and Gemini 1.5 Pro can process up to 2 million tokens simultaneously. They now promise game-changing applications and can analyze entire codebases, legal contracts or research papers in a single inference call.At the core of this discussion is context length — the amount of text an AI model can process and also remember at once. A longer context window allows a machine learning (ML) model to handle much more information in a single request and reduces the need for chunking documents into sub-documents or splitting conversations
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Researchers at Together AI and Agentica have released DeepCoder-14B, a new coding model that delivers impressive performance comparable to leading proprietary models like OpenAI’s o3-mini. Built on top of DeepSeek-R1, this model gives more flexibility to integrate high-performance code generation and reasoning capabilities into real-world applications. Importantly, the teams have fully open-sourced the model, its training data, code, logs and system optimizations, which can help researchers improve their work and accelerate progress.Competitive coding capabilities in a smaller packageThe research team’s experiments show that DeepCoder-14B performs strongly across several challenging coding benchmarks, including LiveCodeBench (LCB), Codeforces and HumanEval+.“Our model demonstrates strong performance across all coding benchmarks… comparable to the performance of o3-mini (low) and o1,” the researchers write in a blog post that describes the model.Interestingly, despite being trained primarily on coding tasks, the model shows improved mathematical reasoning, scoring 73.8% on the AIME 2024 benchmark, a 4.1% improvement over its base model (DeepSeek-R1-Distill-Qwen-14B). This suggests that the reasoning skills developed through RL on code can be generalized effectively to other domains.Credit: Together AIThe most striking aspect is achieving this level of performance with only 14 billion parameters. This makes DeepCoder significantly smaller and potentially more efficient to run than many frontier models.Innovations driving DeepCoder’s performanceWhile developing the model, the researchers solved some of the key challenges in training coding models using reinforcement learning (RL).The first challenge was curating the training data