Herald:The Biostar Herald for Thursday, May 22, 2025
0
4
Entering edit mode
7 weeks ago
Biostar 3.6k

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,


Assessing genomic reproducibility of read alignment tools | bioRxiv (www.biorxiv.org)

Genomic research relies on accurate and reproducible computational analyses of DNA sequencing data to draw reliable biological conclusions. Read mapping, the process of aligning reads to a reference genome, is central to many applications, including variant detection and comparative genomics. While several tools have been developed for this task, genomic reproducibility1, defined as the consistency of results across replicates, remains underexplored. Here, we address this question by introducing a methodology based on synthetic replicates of sequencing data, generated by perturbing the original reads through shuffling, reverse complementing, or combined shuffling and reverse complementing.

submitted by: Istvan Albert


Ultrafast and accurate sequence alignment and clustering of viral genomes | Nature Methods (www.nature.com)

Viromics produces millions of viral genomes and fragments annually, overwhelming traditional sequence comparison methods. Here we introduce Vclust, an approach that determines average nucleotide identity by Lempel–Ziv parsing and clusters viral genomes with thresholds endorsed by authoritative viral genomics and taxonomy consortia. Vclust demonstrates superior accuracy and efficiency compared to existing tools, clustering millions of genomes in a few hours on a mid-range workstation.

submitted by: Istvan Albert


Efficient evidence-based genome annotation with EviAnn | bioRxiv (www.biorxiv.org)

For many years, machine learning-based ab initio gene finding approaches have been the central components of eukaryotic genome annotation pipelines, and they remain so today. The reliance on these approaches was originally sustained by the high cost and low availability of gene expression data, a primary source of evidence for gene annotation along with protein homology. Existing annotation packages often underutilize these data sources, which prompted us to develop EviAnn (Evidence-based Annotation), a novel evidence-based eukaryotic gene annotation system. EviAnn takes a strongly data-driven approach, building the exon-intron structure of genes from transcript alignments or protein-sequence homology rather than from purely ab initio gene finding techniques. We show that when provided with the same input data, EviAnn consistently outperforms current state-of-the-art packages including BRAKER3, MAKER2, and FINDER, while utilizing considerably less computer time. Annotation of a mammalian genome can be completed in less than an hour on a single multi-core server.

submitted by: Istvan Albert


Detection of viral sequences at single-cell resolution identifies novel viruses associated with host gene expression changes | Nature Biotechnology (www.nature.com)

We introduce a method that accurately and rapidly detects viral sequences in bulk and single-cell transcriptomics data based on the highly conserved RdRP protein, enabling the detection of over 100,000 RNA virus species. The analysis of viral presence and host gene expression in parallel at single-cell resolution allows for the characterization of host viromes and the identification of viral tropism and host responses. We apply our method to peripheral blood mononuclear cell data from rhesus macaques with Ebola virus disease and describe previously unknown putative viruses. Moreover, we are able to accurately predict viral presence in individual cells based on macaque gene expression.

submitted by: Istvan Albert


The Sources of Researcher Variation in Economics | NBER (www.nber.org)

We use a rigorous three-stage many-analysts design to assess how different researcher decisions—specifically data cleaning, research design, and the interpretation of a policy question—affect the variation in estimated treatment effects. A total of 146 research teams each completed the same causal inference task three times each: first with few constraints, then using a shared research design, and finally with pre-cleaned data in addition to a specified design. We find that even when analyzing the same data, teams reach different conclusions.

submitted by: Istvan Albert


Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription

herald • 523 views
ADD COMMENT

Login before adding your answer.

Traffic: 1798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6