Full-Time

Research Scientist

Data

Institute of Foundation Models

Institute of Foundation Models

Compensation Overview

$150k - $450k/yr

+ Bonus

H1B Sponsorship Available

Sunnyvale, CA, USA

In Person

Category
Data & Analytics (1)
Required Skills
LLM
Data Science
Requirements
  • Minimum: Master’s in Computer Science, Data Science, or a related technical field, or equivalent practical experience required.
Responsibilities
  • Pioneer web-scale data collection and curation methodologies for large language models and multi-modal foundation models.
  • Design and implement novel data synthesis pipelines for code, mathematics, and agentic reasoning datasets.
  • Trace the impact of data from pre-training to final model capabilities and create automated quality assessment frameworks for massive datasets.
  • Design data recipes that maximize model capabilities across diverse domains.
  • Optimize data-model co-design for improved training dynamics.
  • Contribute to research papers and represent MBZUAI at industry conferences and events, showcasing the institution’s AI research and innovation.
Desired Qualifications
  • Preferred: PhD or equivalent research experience in Machine Learning, Natural Language Processing, or Data Science with a focus on large language models and data is preferred.
  • Prior research experience in areas such as web data curation and mixing, synthesizing complex datasets for training, LLM evaluation, post-training data, efficient inference, LLM-as-a-judge, tokenization.
  • Strong publication record in leading AI conferences (e.g., NeurIPS, ICLR, ICML, EMNLP) and/or prior contributions to open-source AI research or data tools.
  • Hands-on experience training language/multimodal models from scratch.
  • Visa sponsorship eligible
Institute of Foundation Models

Institute of Foundation Models

View

Company Size

N/A

Company Stage

N/A

Total Funding

N/A

Headquarters

United Arab Emirates

Founded

N/A

Simplify Jobs

Simplify's Take

What believers are saying

  • IFM's dedicated teams in Abu Dhabi, Paris, and Silicon Valley drive K2 and JAIS advancements.
  • Active job openings for AI research interns and engineers signal rapid team expansion.
  • PAN world model enables multi-level reasoning in simulations for real-world applications.

What critics are saying

  • OpenAI's o1 surpasses K2 and JAIS by 25% on benchmarks, shifting users in 6-12 months.
  • US export controls block NVIDIA H200 GPUs, delaying K2 releases by 9 months.
  • Stanford CRFM's model with 10x data captures 70% academic citations in 6-12 months.

What makes Institute of Foundation Models unique

  • IFM pioneers open-source K2 Think V2, UAE's sovereign 70B reasoning system released January 2026.
  • IFM advances JAIS 2, world's leading Arabic LLM trained on largest Arabic-first dataset.
  • IFM hosts models on Hugging Face under mbzuai-ifm for global open collaboration.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Institute of Foundation Models who can refer or advise you

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Paid Vacation

Paid Holidays

Parental Leave

Employee Assistance Program

Life Insurance

Disability Insurance

401(k) Plan

Wellness Program

Flexible Work Hours

Remote Work Options

Hybrid Work Options

Stock Options

Company Equity