Full-Time

Senior Data Engineer 2

Data Curation

Posted on 10/2/2025

Formation Bio

Formation Bio

51-200 employees

AI-driven platform accelerates clinical-stage drug development

Compensation Overview

$220k - $280k/yr

+ Equity

Boston, MA, USA + 3 more

More locations: San Francisco, CA, USA | Raleigh, NC, USA | New York, NY, USA

Hybrid

Must reside in NYC, Boston, SF, or Raleigh, or be willing to relocate.

Category
Data & Analytics (2)
,
Required Skills
Python
Airflow
Data Science
SQL
Requirements
  • 7+ years of experience in data engineering, semantic modeling, or data curation, with leadership experience in technical direction
  • Proven expertise in SQL/dbt modeling and integrating healthcare and biomedical ontologies
  • Hands-on experience with ontology-driven harmonization and data model integration across heterogeneous datasets
  • Strong background in data architecture and stack design, with the ability to define standards and paved paths
  • Experience working with unstructured data: entity extraction (NER), NLP, embeddings, or document parsing
  • Familiarity with vector databases, semantic search, and knowledge graph concepts — and how to connect these with structured datasets for unified consumption
  • Comfortable with Python, orchestration tools (Dagster, Airflow), and working with diverse data types
  • Skilled at collaborating with infrastructure teams to balance semantic integration with scalable foundational tooling
  • Excited to mentor others, set high standards, and drive alignment across a multidisciplinary team
Responsibilities
  • Technical Leadership & Strategy: Define and communicate technical direction for the Data Curation team.
  • Drive the architecture and technical stack for ontology-driven harmonization across healthcare and pharmaceutical datasets.
  • Partner with domain experts (claims, EHR, pharma, research) to align technical standards across diverse datasets.
  • Mentor engineers in best practices for modeling, ontology integration, and scalable curation workflows.
Desired Qualifications
  • Experience with knowledge graph technologies (e.g., Neo4j, RDF/SPARQL, Cypher).
  • Experience with healthcare and life sciences ontologies such as Mondo, OMOP, FHIR, SNOMED, RxNorm, UMLS.
  • Experience harmonizing datasets from EHR, claims, or biomedical research domains.
  • Contributions to enterprise data catalogs or metadata management frameworks.

Formation Bio uses an AI-driven platform to speed up drug development by focusing on clinical-stage assets. It acquires and advances these assets using flexible financing structures, aiming to shorten timelines and cut costs for bringing new drugs to market. The core product is the proprietary AI tools that help with drug selection, development planning, and improving overall efficiency, enabling partnerships and successful commercialization with pharmaceutical and biotech companies. Compared with competitors, Formation Bio combines a data-powered approach with adaptable financing and asset acquisition strategies to move assets forward quickly and cost-effectively. The goal is to make treatments more accessible and affordable by accelerating development and enabling faster, more predictable commercialization through AI-enhanced decision making and flexible funding options.

Company Size

51-200

Company Stage

Series D

Total Funding

$530.3M

Headquarters

New York City, New York

Founded

2013

Simplify Jobs

Simplify's Take

What believers are saying

  • Raised $372M Series D funding to accelerate product development and expand pipeline.
  • Appointed EIRs Kia Motesharei, Minji Kim, John Taylor, Anthony Walsh on June 9, 2025.
  • Partners with Sanofi and OpenAI to build custom AI models using proprietary data.

What critics are saying

  • Insilico Medicine advances AI programs to Phase III, starving Formation Bio's pipeline in 12-24 months.
  • FDA 2025 guidance invalidates unproven platform, halting Phase II+ programs in 6-12 months.
  • Owkin sues for federated learning patent infringement, blocking deployment with $200M damages in 18-24 months.

What makes Formation Bio unique

  • Formation Bio's AI platform automates medical writing, protocol development, and patient recruitment across trial lifecycle.
  • Proprietary Clinical Trial Engine uses real-time anomaly detection and automated data cleaning for faster execution.
  • Genetics pipeline integrates human genetic evidence into asset selection for improved decision quality.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Company Equity

Remote Work Options

Flexible Work Hours

Growth & Insights and Company News

Headcount

6 month growth

2%

1 year growth

2%

2 year growth

0%
PR Newswire
Jun 9th, 2025
Formation Bio Announces First Class Of Entrepreneurs In Residence To Drive Pipeline Growth And Strategic Partnerships

NEW YORK, June 9, 2025 /PRNewswire/ -- Formation Bio, an AI-driven pharmaceutical company focused on accelerating drug development, today announced the appointment of its first class of Entrepreneurs in Residence (EIRs): Kia Motesharei, Ph.D., Minji Kim Ph.D, MBA, John Taylor, M.S., and Anthony S. Walsh, D.Phil, These seasoned dealmakers will work closely with Chief Business Officer David Steinberg to expand the company's business development capabilities and further its mission to bring high-potential treatments to patients faster. Formation Bio acquires and in-licenses promising drug assets, advancing them through critical development milestones (Phase II and beyond), and then out-licensing for further advancement. The company's AI platform supports this lifecycle end-to-end—from asset identification and development strategy to execution of clinical trials aimed at greater speed and higher probability of success

intelligence360
Jul 18th, 2024
Trialspark Dba Formation Bio Has Filed A Notice Of An Exempt Offering Of Securities To Raise $372,000,000.00 In New Equity Investment.

TrialSpark dba Formation Bio has filed a notice of an exempt offering of securities to raise $372,000,000.00 in New Equity Investment. TrialSpark dba Formation Bio has filed a notice of an exempt offering of securities to raise $372,000,000.00 in New Equity Investment.According to filings with the U.S. Securities and Exchange Commission, TrialSpark dba Formation Bio is raising up to $372,000,000.00 in new funding. Sources indicate that as part of senior management Chief Executive Officer, Benjamine Liu played a key role in securing the recent investment and it will aid in aggressively expanding the company, as well as broaden and accelerate product development.About TrialSpark dba Formation BioFormation Bio is a tech-driven pharmaceutical company differentiated by radically more efficient drug development. Formation Bio has built a technology platform that optimizes critical aspects of clinical drug development, enabling more efficient trial design, faster trial completion, and higher quality trial data. The company acquires clinical-stage drugs from pharmaceutical and biotech companies with the goal to develop them faster in order to accelerate access to new treatments for patients, and to unlock greater value per program.To learn more about TrialSpark dba Formation Bio, visit http://www.formation.bio/TrialSpark dba Formation Bio Linkedin Page: https://www.linkedin.com/company/formationbio/Contact:Benjamine Liu, Chief Executive Officer866-283-7544https://www.linkedin.com/in/benjamine-liu-54306518/SOURCE: http://www.intelligence360.ioCopyright (c) 2024 SI360 Inc

FinSMEs
Jun 26th, 2024
Formation Bio Raises $372M in Series D Funding

Formation Bio, a NYC-based tech-driven and AI-native pharma company, raised $372M in Series D funding

HIT Consultant
May 21st, 2024
Sanofi, Formation Bio, Openai Partner On Ai-Powered Drug Discovery

What You Should Know:– Sanofi, a global pharmaceutical leader, Formation Bio, an AI-powered drug developer, and OpenAI, a leading research and development company in artificial intelligence, have announced a groundbreaking collaboration. The first-of-its-kind partnership in the pharma and life sciences industries aims to leverage AI to accelerate drug development and bring new medicines to patients faster.– Sanofi, Formation Bio, and OpenAI believe this collaboration will be a game-changer in the pharmaceutical industry. By combining their expertise in data, AI technology, and drug development, they aim to revolutionize the way new medicines are discovered and brought to market.A Symphony of ExpertiseThe collaboration will combine the unique strengths of each partner:Sanofi: Will contribute proprietary data for building custom AI models, furthering their goal of becoming the first biopharma company powered by AI at scale. This data will be crucial for training and refining the AI tools used in drug discovery.Will contribute proprietary data for building custom AI models, furthering their goal of becoming the first biopharma company powered by AI at scale. This data will be crucial for training and refining the AI tools used in drug discovery. OpenAI: World-renowned for its advancements in AI technology, OpenAI will contribute access to cutting-edge AI capabilities

PR Newswire
Jan 10th, 2022
Trialspark partners with SomaLogic, Inc.

TrialSpark is also partnering with industry leaders such as SomaLogic to leverage precision genomics and proteomics platforms to identify key biomarkers and stratify patients using synovial fluid samples from prior clinical studies using AI and machine-learning approaches.

INACTIVE