Full-Time

Scientist - Screening

Functional Genomics

Posted on 7/10/2025

Arc Institute

Arc Institute

201-500 employees

Non-profit research institute advancing foundational science

Compensation Overview

$121.3k - $150k/yr

+ Annual Discretionary Bonus

Palo Alto, CA, USA

In Person

Category
Biology & Biotech (5)
, , , ,
Required Skills
Machine Learning
Asana
Data Analysis
Requirements
  • PhD in Cell biology, Molecular Biology, Medicine, Genomics, or another relevant field. Post-PhD experience is a plus.
  • Deep experience working with different CRISPR-Cas9, other genome editing or perturbation techniques in a high throughput format (e.g. pooled or large arrayed screens).
  • Experience in mammalian cell culture, and familiarity working with primary cells and/or other difficult cell models. Experience working with stem cells and differentiated cell types is a strong plus.
  • Hands-on experience with downstream genomics protocols, especially scRNAseq.
  • Experience with using various project management systems, ELN and LIMS (e.g. Asana, Benchling).
  • Demonstrated ability to work both independently and in a highly collaborative multidisciplinary environment.
  • Strong project management skills with the ability to independently plan, execute, and deliver results on time.
  • Excellent written and verbal communication skills.
Responsibilities
  • Independently execute large *in vitro* perturbation screens in diverse cell types, generate high quality single-cell perturbation datasets, and collaborate cross-functionally to analyze and interpret research findings.
  • Work with our interdisciplinary teams to further optimize functional genomics platforms and CRISPR technologies.
  • Interact cross-functionally with computational and machine learning teams to devise strategies to train and evaluate applications of an AI virtual cell model.
  • Utilize project management systems to track progress and adhere to timelines, and document experimental data on ELN/LIMS in an organized manner.
  • Embrace the opportunity to creatively collaborate with Core Investigators on ad hoc early research projects and with other Technology Centers to push our Virtual Cell Atlas and Alzheimer Disease Initiatives.
  • Stay updated on advancements in the field of functional genomics and single cell omics by reading the literature and attending key Conferences.
  • Present key research findings at internal meetings and seminars, and external conferences.
Desired Qualifications
  • Experience working with stem cells (e.g. ESC, iPSC) and stem cell derived cell types.
  • Experience with designing and cloning large pooled CRISPR sgRNA libraries is a strong plus.
  • Experience in multi-color flow cytometry and FACS-based sorting.
  • Ability to analyze large and complex data sets using Python, R, or similar tools.
  • Experience in molecular biology and/or protein engineering.

Arc Institute is a non-profit research institution in Palo Alto that pursues curiosity-driven basic science and technology development, aiming to accelerate scientific progress and shorten the path from discovery to patient impact. It collaborates with Stanford University, UCSF, and UC Berkeley, and organizes its work around people rather than specific projects, supporting long-term research agendas. Researchers team across disciplines to study root causes of diseases such as cancer, neurodegeneration, and immune dysfunction, with a focus on understanding disease mechanisms and deploying new technologies at scale to enable practical applications.

Company Size

201-500

Company Stage

N/A

Total Funding

N/A

Headquarters

Palo Alto, California

Founded

2021

Simplify Jobs

Simplify's Take

What believers are saying

  • Plans for BioReason 3 extend reasoning to virtual cell and single-cell models.
  • NVIDIA collaboration provides BioNeMo platform and DGX Cloud for AI development.
  • Fully open-sourced models, code, and predictions for 240,000 proteins accelerate adoption.

What critics are saying

  • NVIDIA restricts DGX Cloud access to prioritize Illumina within 12-24 months.
  • Stanford's Evo 2 fragments Arc's IP and diverts pharma funding in 6-12 months.
  • Xaira Therapeutics commercializes closed derivatives, eroding Arc's moat in 18-36 months.

What makes Arc Institute unique

  • BioReason-Pro integrates ESM3 embeddings with GO-GPT for biologist-like protein reasoning.
  • Achieves 73.6% Fmax on CAFA5, surpassing prior models on low-homology proteins.
  • Experts prefer its annotations over UniProt in 79% of blinded evaluations.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Performance Bonus

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

2%

2 year growth

3%
XROM
Mar 21st, 2026
Proteins can now "talk": meet BioReason-Pro - The world's first AI reasoning model that thinks like a biologist.

Proteins can now "talk": meet BioReason-Pro - The world's first AI reasoning model that thinks like a biologist. Proteins can now "talk." And AI is finally listening. For decades, one of biology's most stubborn bottlenecks has been hiding in plain sight. XROM know the sequences. XROM just don't know what they mean. There are now over 250 million protein sequences catalogued in UniProt - but fewer than 0.1% carry experimental functional annotations. ResearchGate The sequencing revolution gave XROM an ocean of data. But XROM has been nearly blind to what most of it actually does. Until now. This week, researchers at the Arc Institute, in collaboration with teams from Stanford, UC Berkeley, UCSF, ETH Zürich, EPFL, the University of Toronto, and Cohere, unveiled BioReason-Pro - the first multimodal reasoning large language model for protein function prediction that integrates protein embeddings with biological context to generate structured reasoning traces. ResearchGate In plain language: it's an AI that doesn't just label proteins. It thinks about them - the way a world-class biologist would. The problem with how biology AI has worked - Until now. Most existing computational tools approach protein function the same way a student might approach a multiple-choice exam: given a sequence, pick the most likely label. It works, but it misses something fundamental about how biology actually operates. A protein's function emerges from the interplay of sequence, structure, evolutionary context, and decades of accumulated ontological knowledge - yet most AI models in biology still operate in their own individual domains. Chalmers tekniska högskola Real biologists don't work that way. They synthesize evidence from protein domains, 3D structures, interaction partners, organism context, and the broader literature before committing to a functional hypothesis. BioReason-Pro is the first AI system built to mirror that integrative process from the ground up. How BioReason-Pro works: reasoning, not just predicting. BioReason-Pro combines ESM3 protein embeddings, a Gene Ontology graph encoder, and biological context to generate structured reasoning traces and functional annotations. UPMC Rather than producing a single output label, the model walks through its logic step-by-step - from molecular evidence, through domain analysis, to a structured functional hypothesis covering molecular function, biological process, cellular localization, and candidate interaction partners. A critical engine inside the system is GO-GPT - an autoregressive transformer that treats GO annotation as a sequence generation task conditioned on protein representations, capturing hierarchical and cross-aspect dependencies of GO terms. UPMC BioReason-Pro was trained via supervised fine-tuning on synthetic reasoning traces generated by GPT-5 for over 130,000 proteins, and further optimised through reinforcement learning. ResearchGate The result is a model that doesn't just pattern-match - it reasons under biological constraints, just as a trained scientist would. The results: numbers that should stop you in your tracks. The benchmarks are striking: * | 73.6% weighted Fmax on GO term prediction - surpassing all prior accessible CAFA5 baselines, the field's gold standard competition * | Strong performance on low-homology proteins - exactly where classical sequence-similarity methods fail most badly * | 79% expert preference rate - in blinded evaluation, human protein experts preferred BioReason-Pro annotations over curated UniProt annotations in 79% of cases ResearchGate, with an average LLM judge score of 8/10 on functional summaries Perhaps most remarkably, BioReason-Pro de novo predicted experimentally confirmed binding partners, with per-residue attention localising to the exact contact residues resolved in cryo-EM structures of those complexes UPMC - meaning the model identified, without being told, the precise molecular regions that matter. That's not prediction. That's understanding. Why this is a paradigm shift - not just a better benchmark. The significance here goes beyond a leaderboard jump. For the past several years, progress in biology AI has been driven by better encoders, larger protein language models like ESM, and stronger structure predictors like AlphaFold. Those advances were transformative. But they all share a common limitation: they produce answers, not explanations. The reasoning traces BioReason-Pro generates are a new kind of output: hypotheses with supporting evidence, proposed mechanisms, and testable interaction partners. Chalmers tekniska högskola For a scientist, that distinction is everything. An annotation you can interrogate, challenge, and build on is infinitely more valuable than a black-box label - no matter how accurate. This is the shift from AI as oracle to AI as research collaborator. The architecture behind the breakthrough. The model integrates data from 133,492 proteins across 3,135 organisms, curated from UniProt with experimental GO annotations, InterPro domains, STRING protein-protein interactions, and PDB protein structures. UPMC It was evaluated on a strict temporal split - training data through November 2022, test data from March 2023 to February 2024 - ensuring the benchmarks reflect genuine generalisation, not memorisation. The base model is built on Qwen3-4B, giving the system its chain-of-thought reasoning capabilities, layered with the biological multimodal inputs that let it move from sequence to function in a way no previous system has achieved. Fully open. Fully accessible. Right now. In an era where major AI breakthroughs are increasingly locked behind paywalls and proprietary APIs, BioReason-Pro is making a different bet. The team has released everything: * | Preprint paper (bioRxiv) * | Full codebase (GitHub) * | Model weights and training data * | Live web application at bioreason.net - with predictions available for over 240,000 proteins including the entire Human Protein Atlas Any researcher, anywhere in the world, can use it today. What this means for drug discovery, disease research & the future of biology. The downstream implications are hard to overstate. The vast majority of proteins in the human body - and across all of life - remain functionally uncharacterised. Every dark corner of the proteome is a potential drug target, a disease mechanism, a biological process XROM don't yet understand. BioReason-Pro demonstrates that AI systems can reason about protein function at expert level, opening a path toward scalable functional characterisation of the millions of uncharacterised proteins across all domains of life. UPMC For drug discovery, that means faster target identification. For rare disease research, it means shining light on proteins that would never attract enough experimental funding to be characterised by hand. For basic science, it means the interpretive bottleneck that has shadowed the genomic revolution may finally be lifting. Biology has always been an integrative reasoning problem. For the first time, AI is built to match that. The bottom line. BioReason-Pro isn't just a better protein classifier. It's a new kind of scientific instrument - one that reads molecular evidence, constructs a biological argument, and delivers a reasoned conclusion that human experts find more useful than the best manually curated database entries in existence. The proteins have started talking. AI is finally fluent enough to listen. Try BioReason-Pro: bioreason.net | Read the preprint: bioRxiv 2026.03.19 | Access the code: GitHub - BioReason-Pro ABOUT THE RESEARCH TEAM: BioReason-Pro was developed by researchers at the Arc Institute, Stanford University, UC Berkeley, UCSF, University of Toronto, ETH Zürich, EPFL, Cohere, and Xaira Therapeutics, led by Adibvafa (Adib) Fallahpour (NVIDIA) and Hani Goodarzi (Arc Institute).

BiopharmaTrend
Jun 24th, 2025
Arc Institute Releases its First Virtual Cell Model

To support future model assessment, Arc has also introduced a "Cell_Eval" benchmarking framework tailored for virtual cell models.

The Daily Californian
Feb 27th, 2025
AI model from Arc Institute can generate strands of DNA

Arc Institute, a research organization that operates in collaboration with UC Berkeley, has released an artificial intelligence model that can classify and generate strands of DNA.

RNA-Seq Blog
Feb 27th, 2025
Arc Virtual Cell Atlas launches, combining data from over 300 million cells

Arc Institute today launched the Arc Virtual Cell Atlas, a growing resource for computation-ready single-cell measurements, starting with data from over 300 million cells.

BiopharmaTrend
Feb 25th, 2025
Vevo Therapeutics Open-Sources Largest Single-Cell Dataset with Arc Institute

Vevo Therapeutics has officially released the Tahoe-100M, described as the world's largest single-cell dataset, in collaboration with the Arc Institute.

INACTIVE