Herald:The Biostar Herald for Monday, December 11, 2023
Entering edit mode
11 weeks ago
Biostar 2.5k

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, cmdcolin, and was edited by Istvan Albert,

vcfdist: accurately benchmarking phased small variant calls in human genomes | Nature Communications (www.nature.com)

Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human whole genome sequencing. In this work, we show that current variant calling evaluations are biased towards certain variant representations and may misrepresent the relative performance of different variant calling pipelines. We propose solutions, first exploring the affine gap parameter design space for complex variant representation and suggesting a standard. Next, we present our tool vcfdist and demonstrate the importance of enforcing local phasing for evaluation accuracy.

submitted by: Istvan Albert

It is looks at the practical issue of variant representation e.g. in VCF format and formal analysis of these issues

submitted by: cmdcolin

GitHub - brentp/jigv: igv.js standalone page generator and automatic configuration to view bam/cram/vcf/bed. "working in under 1 minute" (github.com)

igv.js is a great way to view aligments and other genomic data. It requires that the files are hosted on a server with access to the original data. javascript.

jigv encodes all variants, alignments, and annotations into a single HTML page (or as separate javascript file for each variant) that you can send to collaborators who don't have access to the cluster where your data is stored.

The resulting file is very fast to navigate; the left/right arrow keys advance to next/previous variants of interest.

submitted by: Istvan Albert

Waste not, want not: revisiting the analysis that called into question the practice of rarefaction (journals.asm.org)

Over the past 10 years, the best method for normalizing the sequencing depth of samples characterized by 16S rRNA gene sequencing has been contentious. An often cited article by McMurdie and Holmes forcefully argued that rarefying the number of sequence counts was “inadmissible” and should not be employed. However, I identified a number of problems with the design of their simulations and analysis that compromised their results. In fact, when I reproduced and expanded upon their analysis, it was clear that rarefaction was actually the most robust approach for controlling for uneven sequencing effort across samples. Rarefaction limits the rate of falsely detect­ing and rejecting differences between treatment groups. Far from being “inadmissible”, rarefaction is a valuable tool for analyzing microbiome sequence data.

submitted by: Istvan Albert

Phantom oscillations in principal component analysis (www.pnas.org)

These oscillatory patterns are a mathematical consequence of the way PCA is computed rather than a unique property of the data. We show how two common properties of high-dimensional data can be misinterpreted when visualized in a small number of dimensions.

submitted by: Istvan Albert

The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes | bioRxiv (www.biorxiv.org)

Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution.

submitted by: Istvan Albert


We present pygenomics, a Python package for working with genomic intervals and bioinformatic data files. The package implements interval operations, provides both API and CLI, and supports reading and writing data in widely used bioinformatic formats, including BAM, BED, GFF3, and VCF. The source code of pygenomics is provided with in-source documentation and type annotations and adheres to the functional programming paradigm.

submitted by: Istvan Albert

Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription

herald • 273 views

Login before adding your answer.

Traffic: 2276 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6