The Biostar Herald for Monday, November 21, 2022
8 days ago
Biostar 1.3k

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Mensur Dlakic, Istvan Albert, and was edited by Istvan Albert,

submitted by: Istvan Albert

Frontiers | Fifty psychological and psychiatric terms to avoid: a list of inaccurate, misleading, misused, ambiguous, and logically confused words and phrases (www.frontiersin.org)

While not primarily bioinformatics related, the paper makes several salient points that apply to data analysis and interpretation.

Unsurprisingly many of the criticized terms are heavily used in biology as well.

submitted by: Istvan Albert

Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets | bioRxiv (www.biorxiv.org)

Here, we perform a critical benchmarking study using 11 methods, including five methods designed specifically for long reads.

Our results show that long-read classifiers generally performed best. Several short-read classification and profiling methods produced many false positives (particularly at lower abundances), required heavy filtering to achieve acceptable precision (at the cost of reduced recall), and produced inaccurate abundance estimates.

submitted by: Istvan Albert

GitHub - marbl/Winnowmap: Long read / genome alignment software (github.com)

Winnowmap is a long-read mapping algorithm optimized for mapping ONT and PacBio reads to repetitive reference sequences. Winnowmap development began on top of minimap2 codebase and incorporates various ideas to improve mapping accuracy within repeats.

When comparing Winnowmap (v1.0) to minimap2 (v2.17-r954), we observed a reduction in the mapping error-rate from 0.14% to 0.06% in the recently finished human X chromosome, and from 3.6% to 0% within the highly repetitive X centromere (3.1 Mbp).

submitted by: Istvan Albert

GitHub - marbl/MashMap: A fast approximate aligner for long DNA sequences (github.com)

MashMap implements a fast and approximate algorithm for computing local alignment boundaries between long DNA sequences. It can be useful for mapping genome assembly or long reads (PacBio/ONT) to reference genome(s).

As an example, Mashmap can map a human genome assembly to the human reference genome in about one minute total execution time and < 4 GB memory using just 8 CPU threads, achieving more than an order of magnitude improvement in both runtime and memory over alternative methods

submitted by: Istvan Albert

Single-sequence protein structure prediction using a language model and deep learning | Nature Biotechnology (www.nature.com)

Protein language models work better than AlphaFold2 on orphan sequences

submitted by: Mensur Dlakic

Retained introns in long RNA-seq reads are not reliably detected in sample-matched short reads | Genome Biology | Full Text (genomebiology.biomedcentral.com)

We compared introns detected by 8 tools using short RNA-seq reads with introns observed in long RNA-seq reads from the same biological specimens. We found significant disagreement among tools (Fleiss’ κ=0.113) such that 47.7% of all detected intron retentions were not called by more than one tool. We also observed poor performance of all tools, with none achieving an F1-score greater than 0.26, and qualitatively different behaviors between general-purpose alternative splicing detection tools and tools confined to retained intron detection.

submitted by: Istvan Albert

