A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing

  • Hyeonwoo Kim
  • , Jiwon Kim
  • , Ji Won Choi
  • , Kwang Sung Ahn
  • , Dong Il Park
  • , Sangsoo Kim

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipe-line’s performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discard-ing chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.

Original languageEnglish
Article numbere40
JournalGenomics and Informatics
Volume21
Issue number3
DOIs
StatePublished - Sep 2023
Externally publishedYes

Keywords

  • DADA2
  • HmmUFOtu
  • amplicon sequencing
  • metagenomics

Fingerprint

Dive into the research topics of 'A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing'. Together they form a unique fingerprint.

Cite this