How to Choose Your Metagenomics Classification Tool (ccb.jhu.edu)

A short comparison of Kraken 1 vs KrakenUniq vs Kraken 2 vs Centrifuge plus Bracken and Pavian ...

submitted by: Istvan Albert

Metagenomic classification with KrakenUniq on low-memory computers | bioRxiv (www.biorxiv.org)

We have implemented a new algorithm in KrakenUniq that allows it to load and process the database in chunks, with only a modest increase in running time. This enhancement now makes it feasible to run KrakenUniq on very large datasets and huge databases on virtually any computer, even a laptop, while providing the same very high classification accuracy as the previous system.

submitted by: Istvan Albert

GitHub - soedinglab/MMseqs2: MMseqs2: ultra fast and sensitive search and clustering suite (github.com)

MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets. MMseqs2 is open source GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta version, via cygwin) Windows. The software is designed to run on multiple cores and servers and exhibits very good scalability. MMseqs2 can run 10000 times faster than BLAST. At 100 times its speed it achieves almost the same sensitivity. It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed.

submitted by: Istvan Albert

Urgent need for consistent standards in functional enrichment analysis | PLOS Computational Biology (journals.plos.org)

Gene set enrichment tests (a.k.a. functional enrichment analysis) are among the most frequently used methods in computational biology. Despite this popularity, there are concerns that these methods are being applied incorrectly and the results of some peer-reviewed publications are unreliable. [...] Using seven independent RNA-seq datasets, we show misuse of enrichment tools alters results substantially. In conclusion, most published functional enrichment studies suffered from one or more major flaws, highlighting the need for stronger standards for enrichment analysis.

submitted by: Istvan Albert

Why science needs more research software engineers (www.nature.com)

Ten years after their profession got its name, research software engineers seek to swell their ranks.

submitted by: Istvan Albert

Cost-efficient whole genome-sequencing using novel mostly natural sequencing-by-synthesis chemistry and open fluidics platform | bioRxiv (www.biorxiv.org)

We introduce a massively parallel novel sequencing platform that combines an open flow cell design on a circular wafer with a large surface area and mostly natural nucleotides that allow optical end-point detection without reversible terminators. This platform enables sequencing billions of reads with longer read length (~300bp) and fast runs times (<20hrs) with high base accuracy (Q30 > 85%), at a low cost of $1/Gb. We establish system performance by whole-genome sequencing of the Genome-In-A-Bottle reference samples HG001-7, demonstrating high accuracy for SNPs (99.6%) and Indels in homopolymers up to length 10 (96.4%) across the vast majority (>98%) of the defined high-confidence regions of these samples.

submitted by: Istvan Albert

Comprehensive assessment of differential ChIP-seq tools guides optimal algorithm selection | Genome Biology | Full Text (genomebiology.biomedcentral.com)

We created standardized reference datasets by in silico simulation and sub-sampling of genuine ChIP-seq data to represent different biological scenarios and binding profiles. Using these data, we evaluated the performance of 33 computational tools and approaches for differential ChIP-seq analysis. Tool performance was strongly dependent on peak size and shape as well as on the scenario of biological regulation.

submitted by: Istvan Albert

