TY - JOUR
T1 - NGSCheckMate
T2 - Software for validating sample identity in Next-generation sequencing studies within and across data types
AU - Lee, Sejoon
AU - Lee, Soohyun
AU - Ouellette, Scott
AU - Park, Woong Yang
AU - Lee, Eunjung A.
AU - Park, Peter J.
N1 - Publisher Copyright:
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2017/6/1
Y1 - 2017/6/1
N2 - In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. Availability: https://github.com/parklab/NGSCheckMate.
AB - In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. Availability: https://github.com/parklab/NGSCheckMate.
UR - https://www.scopus.com/pages/publications/85027019663
U2 - 10.1093/nar/gkx193
DO - 10.1093/nar/gkx193
M3 - Article
C2 - 28369524
AN - SCOPUS:85027019663
SN - 0305-1048
VL - 45
SP - e103
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 11
ER -