Herald:The Biostar Herald for Monday, September 19, 2022
Entering edit mode
17 months ago
Biostar 2.5k

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, Rob, and was edited by Istvan Albert,

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2 | Genome Biology | Full Text (genomebiology.biomedcentral.com)

Cuttlefish 2 is a tool for efficiently computing the compacted de Bruijn graph (or a simplitig covering of the de Bruijn graph) from either raw sequencing reads or from reference genomes. It is fast and efficient — for example, it can construct the compacted de Bruijn graph on a set of 661K bacterial genomes in 16 hours and 30 minutes using only 48.7GB of RAM. The cuttlefish 2 algorithm is located in the cuttlefish GitHub repository and is also available via bioconda.

submitted by: Rob

From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools | bioRxiv (www.biorxiv.org)

We found large discrepancies in both the proportion of reads that were classified as well as the number of species that were identified when we used both Kraken2 and MetaPhlAn 3 to classify reads within metagenomes from human-associated or environmental datasets.

submitted by: Istvan Albert

Batch effects removal for microbiome data via conditional quantile regression | Nature Communications (www.nature.com)

Batch effects in microbiome data arise from differential processing of specimens and can lead to spurious findings and obscure true signals.

We apply ConQuR to simulated and real microbiome datasets and demonstrate its advantages in removing batch effects while preserving the signals of interest.

submitted by: Istvan Albert

submitted by: Istvan Albert

[2209.04308] Computational reproducibility of Jupyter notebooks from biomedical publications (arxiv.org)

Here, we analyze the computational reproducibility of 9625 Jupyter notebooks from 1117 GitHub repositories associated with 1419 publications indexed in the biomedical literature repository PubMed Central. 8160 of these were written in Python [...] out of these, 396 notebooks ran through without any errors,

submitted by: Istvan Albert

GitHub - lh3/nasw: Dynamic programming for aa-to-nt alignment with affine gap, splicing and frameshift (github.com)

nasw provides a implementation of dynamic programming (DP) for protein-to-genome alignment with affine-gap penalty, splicing and frameshifts. The DP involves 6 states and 20 transitions, similar to the GeneWise model. Different from GeneWise, nasw explicitly implements the DP recursion with SSE2 or NEON intrinsics and is tens of times faster. Please see nasw.h for the brief API documentation.

submitted by: Istvan Albert

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2 | Genome Biology | Full Text (genomebiology.biomedcentral.com)

The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17–23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54–58 h, using considerably more memory.

submitted by: Istvan Albert

Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription

herald • 516 views

Login before adding your answer.

Traffic: 2341 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6