Herald:The Biostar Herald for Monday, December 05, 2022
5 months ago
The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,

KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping | bioRxiv (www.biorxiv.org)

We present KMCP, a novel k-mer-based metagenomic profiling tool that introduces genomic positions to nucleotide k-mers by splitting the reference genomes into chunks and then stores k-mers in a modified and optimized COBS index for fast alignment-free sequence searching. The index size of KMCP is smaller than that of COBS, and the batch searching speed is increased by 10 times. KMCP combines k-mer similarity and genome coverage information to reduce the false positive rate of k-mer-based taxonomic classification and profiling methods. Benchmarking results based on simulated and real data demonstrate that KMCP not only allows the accurate taxonomic profiling of prokaryotic and viral populations but also provides confident pathogen detection in clinical samples of low coverage.

Github repository:


submitted by: Istvan Albert

A large-scale study on research code quality and execution | Scientific Data (www.nature.com)

. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices.

submitted by: Istvan Albert

submitted by: Istvan Albert

Goodbye, Data Science – r y x, r (ryxcommar.com)

Ahem, some observations hit uncomfortably close.

submitted by: Istvan Albert

Integrative web cloud computing and analytics using MiPair for design-based comparative analysis with paired microbiome data | Scientific Reports (www.nature.com)

In this paper, we thus introduce an integrative web-based tool, named MiPair, for design-based comparative analysis with paired microbiome data. MiPair is a user-friendly web cloud service that is built with step-by-step data processing and analytic procedures for comparative analysis between (or across) groups or between baseline and other groups.

submitted by: Istvan Albert

Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias | PLOS Biology (journals.plos.org)

Analyzing numerous RNA-seq datasets, we detected a prevalent sample-specific length effect that leads to a strong association between gene length and fold-change estimates between samples.

Gene sets characterized by markedly short genes (e.g., ribosomal protein genes) or long genes (e.g., extracellular matrix genes) are particularly prone to such false calls.

submitted by: Istvan Albert


GALBA is a pipeline for fully automated prediction of protein coding gene structures with AUGUSTUS in novel eukaryotic genomes for the scenario where high quality proteins from a closely related species are available.

submitted by: Istvan Albert

