Herald:The Biostar Herald for Monday, November 06, 2023
0
4
Entering edit mode
11 months ago
Biostar 3.0k

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,


submitted by: Istvan Albert


submitted by: Istvan Albert


Carp in the Soil. Ridiculous sequencing results revealed… | by Sixing Huang | Medium (dgg32.medium.com)

Ridiculous sequencing results revealed how errors propagated from one research study to a global database

submitted by: Istvan Albert


Nanopore adaptive sampling: a tool for enrichment of low abundance species in metagenomic samples | Genome Biology | Full Text (genomebiology.biomedcentral.com)

Adaptive sampling is a method of software-controlled enrichment unique to nanopore sequencing platforms. To test its potential for enrichment of rarer species within metagenomic samples, we create a synthetic mock community and construct sequencing libraries with a range of mean read lengths. Enrichment is up to 13.87-fold for the least abundant species in the longest read length library; factoring in reduced yields from rejecting molecules the calculated efficiency raises this to 4.93-fold.

submitted by: Istvan Albert


Omics! Omics!: Concept: An Oxford Nanopore Adaptive Sequencing IDE (omicsomics.blogspot.com)

In adaptive sequencing, bases called from the initial sequencing of a fragment can be used to determine whether to continue sequencing or alternatively the voltage is reverse for that pore only and the fragment is ejected back to the cis side.

submitted by: Istvan Albert


Genotype prediction of 336,463 samples from public expression data | bioRxiv (www.biorxiv.org)

Here, we developed a statistical model based on the existing reference and alternative read counts from the RNA-seq experiments available through Recount3 to predict genotypes at autosomal biallelic loci in coding regions. We demonstrate the accuracy of our model using large-scale studies that measured both gene expression and genotype genome-wide. We show that our predictive model is highly accurate with 99.5% overall accuracy, 99.6% major allele accuracy, and 90.4% minor allele accuracy. Our model is robust to tissue and study effects, provided the coverage is high enough.

submitted by: Istvan Albert


CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure | Genome Biology | Full Text (genomebiology.biomedcentral.com)

HESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess.

submitted by: Istvan Albert


Back to sequences: find the origin of kmers | bioRxiv (www.biorxiv.org)

A vast majority of bioinformatics tools dedicated to the treatment of raw sequencing data heavily use the concept of kmers. This enables us to reduce the data redundancy (and thus the memory pressure), to discard sequencing errors, and to dispose of objects of fixed size that can be manipulated and easily compared to others. A drawback is that the link between each kmer and the original set of sequences it belongs to is generally lost. Given the volume of data considered in this context, finding back this association is costly. In this work, we present ''back_to_sequences'', a simple tool designed to index a set of kmers of interests, and to stream a set of sequences, extracting those containing at least one of the indexed kmer. In addition, the number of occurrences of kmers in the sequences is provided. Our results show that back_to_sequences streams ~200 short read per millisecond, enabling to search kmers in hundreds of millions of reads in a matter of a few minutes.

submitted by: Istvan Albert


Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription

herald • 610 views
ADD COMMENT

Login before adding your answer.

Traffic: 2135 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6