[Seminar] "GC bias-aware abundance estimation from metagenomic data increases accuracy and replicability" by Laurenz Holcik
Title: "GuaCAMOLE: fragment GC bias-aware abundance estimation from metagenomic data increases accuracy and replicability"
Speaker: Laurenz Holcik, Center of Integrative Bioinformatics, University of Vienna.
Abstract: "The importance of the microbiome in medical research, ecology, and other fields has become apparent with the rapid advances in sequencing technologies. Through sequencing technologies, it is not only possible to determine the presence of, for example, specific bacteria, but one can also estimate the abundance of individual species based on the sequenced reads. However, it is known that read counts as generated by next generation sequencing protocols are influenced by factors like GC content or library preparation. For the abundance estimation of microbial organisms the genomic GC content is one major cause of systematic errors. Although this has been described in the literature, there is currently no method to overcome this issue. To account for GC content, we present GuaCAMOLE (Guanine Cytosine Aware Metagenomic Opulence Least squares Estimation), a fast computational method to estimate relative taxon abundance from shotgun metagenomic data. We show on data from a microbial mock community that GuaCAMOLE reduces the relative error of abundance estimates by up to 50 % compared to commonly used methods like Bracken and MetaPhlAn2. GuaCAMOLE builds on the assumption that sequencing reads with a similar GC content will be equally affected by any GC bias present in the data. It bins the reads into discrete bins by organism and GC content. Then the method assigns an abundance parameter to each taxon and uses a fast least squares approximation to minimize squared pairwise distances between bins of equal GC ranges. After the minimization, the abundance parameters are used as unbiased estimators. We show that for identical metagenomic samples, the library preparation protocol has a substantial impact on the GC bias. By its model-free approach, GuaCAMOLE automatically adjusts for protocol specific biases and will thereby contribute to comparability of results in microbiome research. Additionally, GuaCAMOLE can identify mircobial species more reliably by comparing the GC distributions of the observed reads in the sample to the expected distributions from the reference genomes. We show on a metagenomic mock community that GuaCAMOLE reduces the number of falsely detected species by over 75% while still always detecting the truly present species."
Meeting ID: 998 9605 9788