Simplify Logo

Internship

Data Science & Machine Learning Intern

Posted on 11/9/2023

Insitro

Insitro

201-500 employees

Machine learning for drug discovery efficiency

Data & Analytics
Hardware
AI & Machine Learning
Biotechnology
Healthcare

Compensation Overview

$55 - $65Hourly

San Bruno, CA, USA

Category
AI & Machine Learning
Biology & Biotech
Data & Analytics
Required Skills
Python
Data Science
R
Git
Data Structures & Algorithms
Pytorch
SQL
AWS
Pandas
NumPy
Linux/Unix
Data Analysis
Google Cloud Platform
Requirements
  • Working towards a BS, MS, or Ph.D. in engineering, computational biology, systems biology, computer science, mathematics, statistics, life science, chemistry, physics, or a related field
  • Proficiency in one or more general-purpose programming languages. We primarily use Python
  • Curiosity about human physiology or disease biology
  • Ability to communicate effectively and collaborate with people of diverse group of backgrounds and job functions
  • Passion for making a difference in the world
Responsibilities
  • Partner directly with a DSML team mentor in developing and/or applying ML methods to process and analyze large scale datasets from multiple modalities over the course of the summer (11 weeks)
  • Perform single cell transcriptomics data analysis, including cell type annotation and modeling of differentiation trajectories using RNA velocity
  • Use bioinformatic methods to perform downstream analysis in order to extract insights about disease mechanisms, such genes and pathways that are relevant to the therapeutic areas
  • Develop, productionize, and deploy cutting-edge ML approaches to analyze and integrate large-scale multi-modal phenotypic datasets, including multi-omic; modalities (single-cell (sc) transcriptomics, sc-ATAC-seq), and imaging (e.g. brightfield, histopathology)
  • Develop ML methods to process and analyze images from multiple microscopy modalities and integrate our in-vitro imaging data to extract insights about disease mechanisms
  • Explore several recent papers on self-supervised learning for images and demonstrate whether they provide practical benefits when applied to insitro's internal biological datasets, compared to our current algorithms
  • Help us integrate new large language models into our analysis tools to help our analysts get more out of our experimental data, faster
  • Develop workflows to enable post-GWAS (Genome-Wide Association Scan) analysis of results, e.g. fine-mapping
  • Translational genetics deep dives: enabling higher throughput annotation and exploration of candidate genes from our discovery efforts
  • Pipelines to better derive and leverage metadata from sequenced cell lines and to incorporate this into image-based ML feature extraction
  • Design of statistical methods to improve rare variant burden tests, and methods to improve power in longitudinal phenotypes
  • Develop ML models for imputing disease-relevant phenotypes from high-content clinical imaging or time series data (e.g., histopathology, MRI/PET-CT, EEG, EKG)
  • Develop ML methods for disentangling axes of variation in complex phenotypes
  • Use LLMs to extract disease-relevant information from medical records
  • Build rich embedding models using DNA-Encode Library (DEL) data, and use these representations for downstream drug discovery tasks such as hit-discovery
  • Explore generative models of small molecules in various data modalities such as 2D and 3D representations for hit-to-lead drug discovery efforts
  • Develop new geometric deep learning methods to better characterize nuanced molecular properties and relationships
Desired Qualifications
  • First-hand experience with biological data, preferably using computational approaches
  • Passion for learning how to work with diverse functional genomic assays (RNA/DNase/ATAC/ChIP-seq, etc)
  • Interest in learning how to analyze single-cell RNA-seq data
  • Solid understanding of computational chemistry, including virtual screening (classic QSAR modeling, structure based drug-discovery), library design, etc
  • Demonstrated ability to use and develop cutting edge statistical and machine learning methods inspired by real problems
  • Experience with Machine and Deep Learning frameworks (e.g., scikit-learn, PyTorch, etc.)
  • Demonstrated ability to write high-quality, production-ready code (readable, well-tested, with well-designed APIs)
  • Experience in Linux environment, database languages (e.g., SQL, No-SQL) and version control practices and tools such as Git or Mercurial
  • Publications of high-quality work in relevant computational biology, bioinformatics, systems biology, life sciences, or biomedical venues, including journals and conferences
  • Passionate about solving problems, asking questions and learning independently
  • Familiarity with the SciPy/PyData ecosystem (numpy, pandas, scipy, dask etc.)
  • Familiarity with cloud computing services (AWS or GCP)
  • Familiarity with statistical analysis software, e.g. R

Insitro focuses on drug discovery and development in the pharmaceutical research sector. The company utilizes machine learning and biological tools to create predictive models that help identify successful paths for new medicines earlier in the process. This method aims to minimize the costly failures that often occur in pharmaceutical R&D. Insitro's team, which includes scientists, engineers, and drug hunters, collaborates to generate and analyze data to enhance the development of future drugs. Unlike traditional methods that rely heavily on intuition, Insitro's approach is data-driven and seeks to avoid unproductive routes in drug discovery. The company primarily serves pharmaceutical companies, healthcare providers, and research institutions looking for more effective drug development strategies. Insitro's goal is to improve the efficiency and success rate of bringing new medicines to market.

Company Stage

Series C

Total Funding

$825.4M

Headquarters

San Francisco, California

Founded

2018

Growth & Insights
Headcount

6 month growth

11%

1 year growth

19%

2 year growth

52%
Simplify Jobs

Simplify's Take

What believers are saying

  • With over $600 million in venture capital funding, Insitro is well-positioned for sustained growth and innovation in the pharmaceutical R&D sector.
  • The appointment of high-caliber professionals like Emily Fox and Philip Tagari strengthens Insitro's capabilities in AI-driven drug discovery and development.
  • Insitro's partnerships and board appointments, such as Amy Abernethy, indicate strong industry recognition and potential for impactful collaborations.

What critics are saying

  • The competitive landscape in AI-driven drug discovery is intense, with well-funded competitors like Xaira Therapeutics posing significant challenges.
  • Ethical and social justice issues related to AI in drug development, such as data privacy and diversity in clinical trials, could pose regulatory and reputational risks.

What makes Insitro unique

  • Insitro integrates machine learning with modern biological tools to create predictive models, setting it apart from traditional pharmaceutical R&D approaches.
  • The company's focus on data-driven insights and predictive modeling aims to reduce the high failure rates in drug discovery, a significant pain point in the industry.
  • Insitro's leadership team, including experts like Emily Fox and Philip Tagari, brings unparalleled expertise in AI, machine learning, and drug development, enhancing its competitive edge.

Benefits

Excellent medical, dental, and vision coverage

Excellent mental health and well-being support

Open vacation policy

Access to free onsite baristas & cafe with daily lunch and breakfast

Access to free onsite fitness center

Commuter benefits

Paid parental leave

Competitive pay and 401(k) matching

Flexible work schedule (on site and remote)

INACTIVE