Internship
Posted on 12/22/2023
Develops advanced NLP models for text tasks
$3k - $4k/mo
Remote
The company has office spaces around the world, especially in the US, Canada, and Europe, but is very distributed and offers flexible working hours and remote options. Remote employees have the opportunity to visit the offices, and if needed, the company will outfit their workstation to ensure success.
Upload your resume to see how it matches 4 keywords from the job description.
PDF, DOC, DOCX, up to 4 MB
Hugging Face develops machine learning models that understand and generate human-like text, focusing on natural language processing (NLP). Their main products include models like GPT-2 and XLNet, which can perform tasks such as text completion, translation, and summarization. Users can access these models through a web application and a repository, making it easy to integrate AI into various applications. Unlike many competitors, Hugging Face offers a freemium model, providing basic features for free while charging for advanced functionalities and enterprise solutions tailored to large organizations. The company's goal is to empower researchers, developers, and businesses to utilize AI for text-related tasks effectively.
Company Size
501-1,000
Company Stage
Series D
Total Funding
$395.7M
Headquarters
New York City, New York
Founded
2016
Help us improve and share your feedback! Did you find this helpful?
Flexible Work Environment
Health Insurance
Unlimited PTO
Equity
Growth, Training, & Conferences
Generous Parental Leave
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Researchers at Together AI and Agentica have released DeepCoder-14B, a new coding model that delivers impressive performance comparable to leading proprietary models like OpenAI’s o3-mini. Built on top of DeepSeek-R1, this model gives more flexibility to integrate high-performance code generation and reasoning capabilities into real-world applications. Importantly, the teams have fully open-sourced the model, its training data, code, logs and system optimizations, which can help researchers improve their work and accelerate progress.Competitive coding capabilities in a smaller packageThe research team’s experiments show that DeepCoder-14B performs strongly across several challenging coding benchmarks, including LiveCodeBench (LCB), Codeforces and HumanEval+.“Our model demonstrates strong performance across all coding benchmarks… comparable to the performance of o3-mini (low) and o1,” the researchers write in a blog post that describes the model.Interestingly, despite being trained primarily on coding tasks, the model shows improved mathematical reasoning, scoring 73.8% on the AIME 2024 benchmark, a 4.1% improvement over its base model (DeepSeek-R1-Distill-Qwen-14B). This suggests that the reasoning skills developed through RL on code can be generalized effectively to other domains.Credit: Together AIThe most striking aspect is achieving this level of performance with only 14 billion parameters. This makes DeepCoder significantly smaller and potentially more efficient to run than many frontier models.Innovations driving DeepCoder’s performanceWhile developing the model, the researchers solved some of the key challenges in training coding models using reinforcement learning (RL).The first challenge was curating the training data
Decrypt’s Art, Fashion, and Entertainment Hub. Discover SCENEMeta unveiled its newest artificial intelligence models this week, releasing the much anticipated Llama-4 LLM to developers while teasing a much larger model still in training. The model is state of the art, but Zuck’s company claims it can compete against the best close source models without the need for any fine-tuning.“These models are our best yet thanks to distillation from Llama 4 Behemoth, a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world’s smartest LLMs,” Meta said in an official announcement. “Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight.”Both Llama 4 Scout and Maverick use 17 billion active parameters per inference, but differ in the number of experts: Scout uses 16, while Maverick uses 128. Both models are now available for download on llama.com and Hugging Face, with Meta also integrating them into WhatsApp, Messenger, Instagram, and its Meta.AI website.The mixture of experts (MoE) architecture is not new to the technology world, but it is to Llama and is a way to make a model super efficient
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn MoreDeep Cogito, a new AI research startup based in San Francisco, officially emerged from stealth today with Cogito v1, a new line of open source large language models (LLMs) fine-tuned from Meta’s Llama 3.2 and equipped with hybrid reasoning capabilities — the ability to answer quickly and immediately, or “self-reflect” like OpenAI’s “o” series and DeepSeek R1.The company aims to push the boundaries of AI beyond current human-overseer limitations by enabling models to iteratively refine and internalize their own improved reasoning strategies. It’s ultimately on a quest toward developing superintelligence — AI smarter than all humans in all domains — yet the company says that “All models we create will be open sourced.”Deep Cogito’s CEO and co-founder Drishan Arora — a former Senior Software Engineer at Google who says he led the large language model (LLM) modeling for Google’s generative search product —also said in a post on X they are “the strongest open models at their scale – including those from LLaMA, DeepSeek, and Qwen.”The initial model lineup includes five base sizes: 3 billion, 8 billion, 14 billion, 32 billion, and 70 billion parameters, available now on AI code sharing community Hugging Face, Ollama and through application programming interfaces (API) on Fireworks and Together AI.They’re available under the Llama licensing terms which allows for commercial usage — so third-party enterprises could put them to work in paid products — up to 700 million monthly users, at which point they need to obtain a paid license from Meta.The company plans to release even larger models — up to 671 billion parameters — in the coming months.Arora describes the company’s training approach, iterated distillation and amplification (IDA), as a novel alternative to traditional reinforcement learning from human feedback (RLHF) or teacher-model distillation.The core idea behind IDA is to allocate more compute for a model to generate improved solutions, then distill the improved reasoning process into the model’s own parameters — effectively creating a feedback loop for capability growth. Arora likens this approach to Google AlphaGo’s self-play strategy, applied to natural language.The Cogito models are open-source and available for download via Hugging Face and Ollama, or through APIs provided by Fireworks AI and Together AI. Each model supports both a standard mode for direct answers and a reasoning mode, where the model reflects internally before responding.Benchmarks and evaluationsThe company shared a broad set of evaluation results comparing Cogito models to open-source peers across general knowledge, mathematical reasoning, and multilingual tasks. Highlights include:Cogito 3B (Standard) outperforms LLaMA 3.2 3B on MMLU by 6.7 percentage points (65.4% vs
I am very excited to see continued collaborations and new features from KServe being integrated in Kubeflow 1.10 release, particularly the model cache feature and integration with Hugging Face, which enables more streamlined deployment and efficient autoscaling for both predictive and generative models.
Artificial intelligence (AI) is now a household word, thanks to the popularity of large language models like ChatGPT. These large models are trained on the whole internet and often have hundreds of billions of parameters — settings inside the model that help it guess what word comes next in a sequence. The more parameters, the [] The post AI Explained: What’s a Small Language Model and How Can Business Use It? appeared first on PYMNTS.com.