Herald:The Biostar Herald for Monday, September 05, 2022
Entering edit mode
12 months ago
Biostar 2.0k

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, Wayne, and was edited by GenoMax, Istvan Albert,

DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer | Nature Biotechnology (www.nature.com)

Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10–25 kilobases), accurate ‘HiFi’ reads by combining serial observations of a DNA molecule into a consensus sequence. The standard approach to consensus generation, pbccs, uses a hidden Markov model. We introduce DeepConsensus, which uses an alignment-based loss to train a gap-aware transformer–encoder for sequence correction.

submitted by: Istvan Albert

Uniprot has been redesigned and a lot of code and approaches to access it need updating, example. This package, Unpiressed by Michael Milton, provides a way to use Python to programmatically to query the new Uniprot API.

submitted by: Wayne


We compared TPM, FPKM, normalized counts using DESeq2 and TMM approaches, and we examined the impact of using variance stabilizing Z-score normalization on TPM-level data as well. We found that for our datasets, both DESeq2 normalized count data (i.e., median of ratios method) and TMM normalized count data generally performed better than the other quantification measures.

submitted by: Istvan Albert

submitted by: Istvan Albert

GitHub - eelhaik/PCA_critique (github.com)

Github repository for the paper: Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated

submitted by: Istvan Albert

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated | Scientific Reports (www.nature.com)

Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. [...] We analyzed twelve common test cases using an intuitive color-based model alongside human population data. We demonstrate that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes.

submitted by: Istvan Albert

submitted by: Istvan Albert

Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres | Genome Biology | Full Text (genomebiology.biomedcentral.com)

Nanopore long-read sequencing is an emerging approach for studying genomes, including long repetitive elements like telomeres. Here, we report extensive basecalling induced errors at telomere repeats across nanopore datasets, sequencing platforms, basecallers, and basecalling models. We find that telomeres in many organisms are frequently miscalled. We demonstrate that tuning of nanopore basecalling models leads to improved recovery and analysis of telomeric regions, with minimal negative impact on other genomic regions.

submitted by: Istvan Albert

Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription

herald • 430 views

Login before adding your answer.

Traffic: 2886 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6