Question

Herald:The Biostar Herald for Monday, October 24, 2022

2

Entering edit mode

2.7 years ago

Biostar 3.6k

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,

Urgent need for consistent standards in functional enrichment analysis | PLOS Computational Biology (journals.plos.org)

erns that statistical problems and incomplete reporting are compromising research quality. In this article, we conducted a systematic examination of published enrichment analyses and assessed whether (i) any statistical flaws were present and (ii) sufficient methodological detail is provided such that the study could be replicated. We found that lack of methodological detail and errors in statistical analysis were widespread, which undermines the reliability and reproducibility of these research articles. A set of best practices is urgently needed to raise the quality of published work.

submitted by: Istvan Albert

Did someone mention peer review? Every scientist should read this historical perspective from @Melinda_Baldwin. Scientific review practices are post-Cold War and not introduced to improve papers. The past is a strange land. The future maybe stranger. https://t.co/sOhax0aYKB pic.twitter.com/jPlBGvvqAZ
— Richard McElreath 🦔 (@rlmcelreath) October 21, 2022

submitted by: Istvan Albert

Twelve years of SAMtools and BCFtools - PubMed (pubmed.ncbi.nlm.nih.gov)

The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines.

submitted by: Istvan Albert

Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows - PubMed (pubmed.ncbi.nlm.nih.gov)

To our surprise, we have recently discovered that this intuition is incorrect. Instead, BLAST returns the first N hits that exceed the specified E-value threshold, which may or may not be the highest scoring N hits. The invocation using the parameter ‘-max_target_seqs 1’ simply returns the first good hit found in the database, not the best hit as one would assume. Worse yet, the output produced depends on the order in which the sequences occur in the database.

submitted by: Istvan Albert

GSearch: Ultra-Fast and Scalable Microbial Genome Search by combining Kmer Hashing with Hierarchical Navigable Small World Graphs | bioRxiv (www.biorxiv.org)

We developed a new program, GSearch, that is at least ten times faster than alternative tools for the same purposes while maintaining high accuracy. GSearch can identify/classify eight thousand query genomes against all available microbial and viral genomic species within several minutes on a personal laptop, using only ~6GB of memory. Further, GSearch can scale well with millions of database genomes based on a database splitting strategy.

submitted by: Istvan Albert

Benchmarking taxonomic classifiers with Illumina and Nanopore sequence data for clinical metagenomic diagnostic applications - PubMed (pubmed.ncbi.nlm.nih.gov)

Simulating both with and without sequencing error for both the Illumina and Oxford Nanopore platforms, we evaluated commonly used classification tools including Kraken2, Bracken and Centrifuge, utilizing mini (8 GB) and standard (30-50 GB) databases. Bracken with the standard database performed best, the median percentage of reads across both sequencing platforms identified correctly to the species level was 97.8% (IQR 92.7:99.0) [range 5:100]. For Kraken2 with a mini database, a commonly used combination, median species-level identification was 86.4% (IQR 50.5:93.7) [range 4.3:100].

submitted by: Istvan Albert

High discrepancy between Primary estimate and Bootstraps from Salmon (www.biostars.org)

Biostar Q&A at its best.

A challenging question on a more sophisticated usage scenario for the salmon transcript classification software - answered promptly and thoroughly by Prof. Rob Patro - the lead developer of said tool.

We wish bioinformatics Q&A always worked this well!

One more reason (if you ever needed another one) to use salmon, the wicked-fast transcript classifier:

https://salmon.readthedocs.io/en/latest/

submitted by: Istvan Albert

Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome | PLOS Computational Biology (journals.plos.org)

We demonstrated that in breast cancer any set of 100 genes or more selected at random has a 90% chance to be significantly associated with outcome. Thus, investigators are bound to find an association however whimsical their marker is.

submitted by: Istvan Albert

Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription

herald • 1.1k views

ADD COMMENT • link updated 2.7 years ago by shelkmike ★ 1.6k • written 2.7 years ago by Biostar 3.6k

score 2 · Answer 1 · 2022-10-24

2

Entering edit mode

2.7 years ago

shelkmike ★ 1.6k

The main statement of the article "Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows" has been disproved already, see https://academic.oup.com/bioinformatics/article/35/15/2699/5259186?login=true

ADD COMMENT • link 2.7 years ago by shelkmike ★ 1.6k