Year-round

Research Intern - Multimodal LLM

Speech/Music/Audio/Vision/Language

Posted on 5/16/2026

Tencent

Tencent

10,001+ employees

Multifaceted tech platform: social, gaming, fintech

Compensation Overview

$38.54 - $60/hr

Bellevue, WA, USA

In Person

Category
AI & Machine Learning (2)
,
Required Skills
LLM
Python
Neural Networks
Pytorch
Machine Learning
C/C++
Computer Vision
Requirements
  • Ph.D. students in computer science, electrical engineering, mathematics or a related field
  • self-motivated and excited about developing novel techniques
  • research experiences in natural language processing, speech, audio, and music processing, computer vision, dialog system, or machine learning
  • good publication track record and history of creativity and intellectual flexibility
  • ability to program in Python and/or C++ and experience using one of the leading deep learning toolkits
  • Intern duration: 3 months (with the possibility of extension). Can start any time in the year 2026.
  • Location: Bellevue, Washington, United States
  • Expected base pay range is provided but not a requirement; actual pay may vary by knowledge, skills, and experience
  • Eligible for paid sick leave and company medical plan for interns
  • Internship is at Tencent AI Lab in Seattle Area
Responsibilities
  • Work with researchers on a research project aimed at attacking one of the core problems in multimodal artificial intelligence by inventing cutting edge techniques.
  • Publish results from the internship.
  • Contribute to projects spanning multimodal pretraining and post-training strategies for audio, speech, music, image, and video understanding and generation.
  • Aim to enable fully duplex conversations, design more efficient large-model architectures, enhance multimodal memory and reasoning capabilities, and advance techniques for processing audio, speech, music, image, and video (including encoding, tokenization, and representation learning) with focus on multimodal applications and end-to-end large models.

Tencent is a global technology platform that connects people and businesses through a wide range of services, including social networking, gaming, fintech, and cloud computing. Its flagship products include WeChat, a messaging and mobile payments app with over a billion users; Tencent Games, a major game publisher; Tencent Cloud for storage and computing needs; and fintech services such as mobile payments and wealth management. The company stands out by offering a large, integrated ecosystem that combines social, payments, gaming, and cloud services in one place. Its goal is to enrich daily life for internet users and help businesses modernize and operate more efficiently.

Company Size

10,001+

Company Stage

IPO

Headquarters

Shenzhen, China

Founded

1998

Simplify Jobs

Simplify's Take

What believers are saying

  • WeChat AI agent testing can deepen engagement inside Tencent's largest consumer platform.
  • AI spending above 36 billion yuan supports product rollout and infrastructure upgrades.
  • AI-driven improvements in gaming and advertising are already lifting earnings.

What critics are saying

  • Tencent still trails ByteDance and Alibaba in AI adoption and model progress.
  • WeChat AI launch depends on compliance approval, creating direct regulatory delay risk.
  • Heavy AI spending can pressure margins if monetization lags behind investment.

What makes Tencent unique

  • WeChat gives Tencent a super-app distribution layer across messaging, payments, and services.
  • Tencent combines gaming, advertising, fintech, and cloud businesses under one ecosystem.
  • Its Shenzhen headquarters and 1998 founding anchor a long-running China internet franchise.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Tencent who can refer or advise you

Benefits

Professional Development Budget

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

0%

2 year growth

0%
InterSpace Distribution
Jun 8th, 2026
Thai pop music demonstrates significant revenue growth.

Thai pop music demonstrates significant revenue growth. Thai pop music is experiencing rapid expansion, driven by dedicated fandom and increasing revenue from live events and merchandise. June 8, 2026 Thai pop (T-pop) has been gaining international recognition, fueled by the influence of dedicated fanbases. Revenue for the T-pop label collective is projected to reach Bt11 billion (£250 million) in 2026, increasing to Bt13 billion (£295 million) by 2029, indicating a recovery from pandemic-era levels. Research indicates that while superfans represent approximately 2% of an artist's total listenership, they can contribute up to 42% of an artist's revenue. A study found that 81% of T-pop listeners identify as Gen Z. Growth in the sector is primarily attributed to increased sales of live event tickets and merchandise, sustaining song popularity beyond initial release dates. Streaming, social media engagement, fan gatherings, strategic partnerships, and improvements in production quality have all contributed to the genre's rise, particularly through the popularity of Thai GL and BL series, which also promote the artists involved. Successful strategies for T-pop include prioritizing partnerships; for example, Thai rapper Milli gained prominence in South Korea after appearing on a Korean game show. GMM Music also expanded its international reach through a partnership with Tencent. Building strong fan ecosystems through brand collaborations is also proving effective, with 84% of T-pop fans reporting purchases of products or services endorsed by their favored artists. Exclusive merchandise, such as photo cards and limited-edition items, and exclusive fan meetups are particularly valuable offerings. June 8, 2026 How to price cds online when shipping & fees cut into profits. June 7, 2026 Increased visibility from 'off campus' series drives music sync growth. June 8, 2026

EcoTopical
Jun 3rd, 2026
Mitsubishi, Tencent and WWF unite to kick-start carbon credit buying in Asia.

Mitsubishi, Tencent and WWF unite to kick-start carbon credit buying in Asia. * Trellis By Jim GilesJun 3, 2026, 9:59 am183 ptsTrendingTop Asked to name a company with an ambitious climate program, even sustainability veterans would likely choose one from North America or Europe. But over the past year or so, a series of private- and public-sector initiatives have moved the center of gravity of corporate sustainability towards Asia. The most... Read Article Share Article * email * x.com * facebook * pocket * reddit * tumblr * linkedin * pinterest Discover more Welcome to EcoTopical Your daily eco-friendly green news aggregator. Leaf through planet Earths environmental headlines in one convenient place. Read, share and discover the latest on ecology, science and green living from the web's most popular sites.

AASTOCKS.com Limited
May 29th, 2026
Tencent invests in brain-computer interface chip maker Nuanxinjia

Hangzhou Nuanxinjia Electronic Technology, a Chinese brain-computer interface and neurobiological chip developer, has received investment from Tencent through its subsidiary Shanghai Qishan Investment. The company's registered capital increased to CNY 16.89 million following the stake acquisition. Founded in 2014, Nuanxinjia specialises in integrated circuit design, electronic products and biomedical equipment development. The company focuses on research, development, production and sales of brain-computer interfaces and neurobiological chips. The investment marks Tencent's entry into the brain-computer interface sector, joining a growing field of technology companies exploring neural technology applications.

UMB Financial Services
May 26th, 2026
Going off the thumb: why local inference and deterministic tools beat cloud AI.

Going off the thumb: why local inference and deterministic tools beat cloud AI. The recent exposure of Microsoft Copilot Cowork's ability to exfiltrate files through uncontrolled email agents shows how cloud-hosted AI can become a liability rather than an asset[1]. When an agent can send messages to a user's own inbox... The recent exposure of Microsoft Copilot Cowork's ability to exfiltrate files through uncontrolled email agents shows how cloud-hosted AI can become a liability rather than an asset[1]. When an agent can send messages to a user's own inbox and leak data via rendered images, the promise of "AI everywhere" collapses into a security nightmare. This is not an isolated glitch; it reflects a broader pattern where reliance on massive, opaque models hosted by a few providers creates single points of failure that are costly to patch and dangerous to ignore. At the same time, economic pressure is mounting. Uber's president has said that AI spending is getting harder to justify[17], and analysts argue that outsourcing workloads to local AI will soon be more economical than depending on frontier labs[16]. The cost equation is shifting: running a model on premises or in a modest self-hosted data center avoids the recurring fees, data-transfer charges, and vendor lock-in that come with proprietary APIs. When the bill for a cloud call starts to outweigh the benefit, the case for local inference becomes obvious. Security, cost, and control converge on a simple principle: if a job can be done deterministically, it should be. Deterministic solutions offer predictable latency, zero surprise behavior, and easier auditing. Minicor demonstrates this by providing Windows desktop automations at scale without requiring an AI model to guess UI elements; it scripts interactions directly, delivering reliability that a probabilistic agent cannot match[6]. Paul Graham's observation that AI-generated founder emails now read like hard-hit journalism - and that he instinctively discounts them - highlights how even when LLMs work, their output can feel artificial and untrustworthy[5]. In contexts where consistency matters, a rule-based script or a small, purpose-built tool outperforms a large language model. Fortunately, the ecosystem for running AI locally is maturing. The Feedback Wanted thread shows a growing movement to bundle open-source apps, models, and pipelines into a single installer that gives anyone a friendly UI to monitor hardware and manage workloads[4]. Harbor's latest release takes this further by letting users launch agentic coding tools with local inference backends such as vLLM, SGLang, or llama.cpp, and even proxy requests through an optimising LLM gateway[9]. These tools remove the friction that once made self-hosting a hobbyist's project and turn it into a viable production option. Open models are also becoming more permissive and capable. MOSS-TTS-v1.5 preserves zero-shot voice cloning, long-form speech generation, and multilingual synthesis while adding stronger multilingual abilities[3]. Tencent's Hy-MT2 has been released under the Apache License 2.0, giving firms a clear path to integrate a high-quality translation model without worrying about local inference open models self-hosting AI hardware off the thumb

Creative AI News
Apr 9th, 2026
MegaStyle trains FLUX on 1.4M styled images.

MegaStyle trains FLUX on 1.4M styled images. Researchers from Tongji University, Tencent, and five other institutions released MegaStyle, a 1.4-million image dataset for style transfer alongside a FLUX-based model. Researchers from Tongji University, Tencent, and five other institutions released MegaStyle, a 1.4-million image dataset purpose-built for style transfer alongside a FLUX-based model that applies artistic styles to new images. The dataset provides 170,000 style prompts combined with 400,000 content prompts, creating up to 68 billion potential training pairs. What happened. MegaStyle addresses a core problem in AI style transfer: existing datasets are too small, inconsistent in style labeling, or lack diversity. The team built a scalable data curation pipeline that uses text-to-image models to generate images matching specific style descriptions, drawing source material from JourneyDB (1M images), WikiArt (80K), and LAION-Aesthetics (1M). The project ships two tools. MegaStyle-FLUX is a diffusion model trained on the full dataset that takes a reference style image and applies it to new content. MegaStyle-Encoder is a style-specialized image encoder fine-tuned with contrastive learning for measuring style similarity and retrieving matching styles. Why it matters. Style transfer has been possible for years, but quality and consistency have lagged behind other generative AI capabilities. MegaStyle's approach of building a massive, structured dataset first and then training models on it produces measurably better results. The encoder achieves 87.26 mAP@1 on the StyleRetrieval benchmark, with 97.61 Recall@10 for finding similar styles. For designers and illustrators, the FLUX-based model means applying an artistic style from one reference image to new content with higher fidelity than current alternatives. The encoder adds the ability to search large image collections by visual style rather than just by content or keywords. Key details. * Dataset: 1.4M images across 170K style categories, with intra-style consistency and inter-style diversity verified at scale * MegaStyle-FLUX: Concatenates reference style tokens with noisy image tokens and text inputs in the MM-DiT backbone for style-conditioned generation * MegaStyle-Encoder: Style-supervised contrastive learning (SSCL) produces embeddings that capture style independently from content * Contributors: Tongji University, Tencent, NTU Singapore, HKUST, Fuzhou University, HKU, NUS What to do next. The full research paper details the dataset construction pipeline and benchmark results. The project page provides visual comparisons against existing style transfer methods. Creators working with FLUX-based workflows should watch for code and model weight releases, which would enable integration into existing image generation pipelines.

INACTIVE