Comprehensive variant discovery in the era of complete human reference genomes | Nature Methods (www.nature.com)

Advances in long-read sequencing technologies have broadened our understanding of genetic variation in the human population, uncovered new complex structural variants and offered an opportunity to elucidate new variant associations with disease.

submitted by: Istvan Albert

Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing | Nature Methods (www.nature.com)

The year 2022 will be remembered as the turning point for accurate long-read sequencing, which now establishes the gold standard for speed and accuracy at competitive costs. We discuss the key bioinformatics techniques needed to power long reads across application areas and close with our vision for long-read sequencing over the coming years.

submitted by: Istvan Albert

A field-wide assessment of differential high throughput sequencing reveals widespread bias | bioRxiv (www.biorxiv.org)

Analysis of GEO submission file structures places an overall 56% upper limit to reproducibility without querying other sources. We further show that only 23% of experiments resulted in theoretically expected p value histogram shapes, although both reproducibility and p value distributions show marked improvement over time.

submitted by: Istvan Albert

Principal component analysis | Nature Reviews Methods Primers (www.nature.com)

Principal component analysis is a versatile statistical method for reducing a cases-by-variables data table to its essential features, called principal components. Principal components are a few linear combinations of the original variables that maximally explain the variance of all the variables. [...] This Primer presents a comprehensive review of the method’s definition and geometry, as well as the interpretation of its numerical and graphical results.

submitted by: Istvan Albert

Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets | BMC Bioinformatics | Full Text (bmcbioinformatics.biomedcentral.com)

Our results show that long-read classifiers generally performed best. Several short-read classification and profiling methods produced many false positives (particularly at lower abundances), required heavy filtering to achieve acceptable precision (at the cost of reduced recall), and produced inaccurate abundance estimates. By contrast, two long-read methods (BugSeq, MEGAN-LR & DIAMOND) and one generalized method (sourmash) displayed high precision and recall without any filtering required. Furthermore, in the PacBio HiFi datasets these methods detected all species down to the 0.1% abundance level with high precision.

submitted by: Istvan Albert

ViralConsensus: A fast and memory-efficient tool for calling viral consensus genome sequences directly from read alignment data | bioRxiv (www.biorxiv.org)

ViralConsensus is a fast and memory-efficient tool for calling viral consensus genome sequences directly from read alignment data. ViralConsensus is orders of magnitude faster and more memory-efficient than existing methods. Further, unlike existing methods, ViralConsensus can pipe data directly from a read mapper via standard input and performs viral consensus calling on-the-fly, making it an ideal tool for viral sequencing pipelines.

submitted by: Istvan Albert

The PCA review is really intriguing, and I'm sure a must read for many people, but unfortunately totally pay-walled, even for Nature subscribers without an "extra" subscription. There is a github site behind it though: https://github.com/michaelgreenacre/PCA


