Full-Time

High Performance Computing Software Engineer

Supercomputing

Institute of Foundation Models

Institute of Foundation Models

No salary listed

Abu Dhabi - United Arab Emirates

In Person

Category
Software Engineering (1)
Required Skills
Kubernetes
Tensorflow
Pytorch
Linux/Unix
Requirements
  • Proven experience developing and optimizing software for large-scale ML workloads (1000+ GPUs preferred).
  • Deep understanding of Linux kernel internals and accelerator (GPU) kernel development.
  • Proficiency with distributed communication libraries (e.g., NCCL, RCCL, MPI, UCX, SHARP, Libfabric).
  • Experience with ML frameworks like PyTorch, TensorFlow, JAX, or MegatronLM.
  • Strong knowledge of HPC job scheduling and orchestration tools (e.g., Slurm, Kubernetes, Pyxis).
  • Excellent debugging and systems performance tuning skills.
  • A collaborative mindset with a focus on shared success and technical excellence.
Responsibilities
  • Design and implement high-performance, distributed software solutions for large-scale AI/ML training.
  • Optimize low-level system components including Linux kernel, GPU/accelerator kernels, and interconnects.
  • Develop and tune communication libraries such as NCCL, MPI, UCX, RCCL, and RDMA-based systems.
  • Partner with ML researchers and engineers to support frameworks like PyTorch, MegatronLM, and DeepSpeed in large-scale production environments.
  • Contribute to our scheduling, orchestration, and job management systems, including Slurm and Kubernetes.
  • Debug and resolve complex issues across the stack—from kernel to container to model.
  • Work closely with hardware vendors, upstream open-source communities, and internal teams to drive performance and reliability improvements.
Institute of Foundation Models

Institute of Foundation Models

View

Company Size

N/A

Company Stage

N/A

Total Funding

N/A

Headquarters

United Arab Emirates

Founded

N/A

Simplify Jobs

Simplify's Take

What believers are saying

  • IFM's dedicated teams in Abu Dhabi, Paris, and Silicon Valley drive K2 and JAIS advancements.
  • Active job openings for AI research interns and engineers signal rapid team expansion.
  • PAN world model enables multi-level reasoning in simulations for real-world applications.

What critics are saying

  • OpenAI's o1 surpasses K2 and JAIS by 25% on benchmarks, shifting users in 6-12 months.
  • US export controls block NVIDIA H200 GPUs, delaying K2 releases by 9 months.
  • Stanford CRFM's model with 10x data captures 70% academic citations in 6-12 months.

What makes Institute of Foundation Models unique

  • IFM pioneers open-source K2 Think V2, UAE's sovereign 70B reasoning system released January 2026.
  • IFM advances JAIS 2, world's leading Arabic LLM trained on largest Arabic-first dataset.
  • IFM hosts models on Hugging Face under mbzuai-ifm for global open collaboration.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Institute of Foundation Models who can refer or advise you

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Paid Vacation

Paid Holidays

Parental Leave

Employee Assistance Program

Life Insurance

Disability Insurance

401(k) Plan

Wellness Program

Flexible Work Hours

Remote Work Options

Hybrid Work Options

Stock Options

Company Equity