devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data

  • Francisco X. Galdos
  • , Sidra Xu
  • , William R. Goodyer
  • , Lauren Duan
  • , Yuhsin V. Huang
  • , Soah Lee
  • , Han Zhu
  • , Carissa Lee
  • , Nicholas Wei
  • , Daniel Lee
  • , Sean M. Wu

Research output: Contribution to journalArticlepeer-review

30 Scopus citations

Abstract

A major informatic challenge in single cell RNA-sequencing analysis is the precise annotation of datasets where cells exhibit complex multilayered identities or transitory states. Here, we present devCellPy a highly accurate and precise machine learning-enabled tool that enables automated prediction of cell types across complex annotation hierarchies. To demonstrate the power of devCellPy, we construct a murine cardiac developmental atlas from published datasets encompassing 104,199 cells from E6.5-E16.5 and train devCellPy to generate a cardiac prediction algorithm. Using this algorithm, we observe a high prediction accuracy (>90%) across multiple layers of annotation and across de novo murine developmental data. Furthermore, we conduct a cross-species prediction of cardiomyocyte subtypes from in vitro-derived human induced pluripotent stem cells and unexpectedly uncover a predominance of left ventricular (LV) identity that we confirmed by an LV-specific TBX5 lineage tracing system. Together, our results show devCellPy to be a useful tool for automated cell prediction across complex cellular hierarchies, species, and experimental systems.

Original languageEnglish
Article number5271
JournalNature Communications
Volume13
Issue number1
DOIs
StatePublished - Dec 2022

Fingerprint

Dive into the research topics of 'devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data'. Together they form a unique fingerprint.

Cite this