Question

Herald:The Biostar Herald for Monday, February 12, 2024

2

Entering edit mode

17 months ago

Biostar 3.6k

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from GenoMax, Istvan Albert, Rob, and was edited by Istvan Albert,

Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads | bioRxiv (www.biorxiv.org)

Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads

We develop Forseti, a predictive model to probabilistically assign a splicing status to scRNA-seq reads. Our model has two key components. First, we train a binding affinity model to assign a probability that a given transcriptomic site is used in fragment generation. Second, we fit a robust fragment length distribution model that generalizes well across datasets deriving from different species and tissue types. Forseti combines these two trained models to predict the splicing status of the molecule of origin of reads by scoring putative fragments that associate each alignment of sequenced reads with proximate potential priming sites. Using both simulated and experimental data, we show that our model can precisely predict the splicing status of reads and identify the true gene origin of multi-gene mapped reads.

submitted by: Rob

scCensus: Off-target scRNA-seq reads reveal meaningful biology | bioRxiv (www.biorxiv.org)

Overall, our results suggest that off-target scRNA-seq reads contain underappreciated information about various transcriptional activities. These observations about yet-unexploited information in existing scRNA-seq data will help guide and motivate the community to improve current algorithms and analysis methods, and to develop novel approaches that utilize off-target reads to extend the reach and accuracy of single-cell data analysis pipelines.

submitted by: Rob

Is there a semantic difference between GT=./. and GT=0/0 + GQ=0 ? · Issue #756 · samtools/hts-specs · GitHub (github.com)

A recent GATK change replaces GT ./. due to DP=0 with 0/0 (with GQ=0).

As some point out, the two representations are not the same.

submitted by: Istvan Albert

COBRA improves the completeness and contiguity of viral genomes assembled from metagenomes | Nature Microbiology (www.nature.com)

Contig Overlap Based Re-Assembly (COBRA) resolves assembly breakpoints based on the de Bruijn graph and joins contigs.

submitted by: GenoMax

GitHub - schneebergerlab/plotsr: Tool to plot synteny and structural rearrangements between genomes (github.com)

Plotsr generates high-quality visualisation of synteny and structural rearrangements between multiple genomes. For this, it uses the genomic structural annotations between multiple chromosome-level assemblies.

submitted by: Istvan Albert

Accurate quantification of single-cell and single-nucleus RNA-seq transcripts using distinguishing flanking k-mers | bioRxiv (www.biorxiv.org)

Here, we introduce the concept of distinguishing flanking k-mers (DFKs) to improve mapping of sequencing reads. We have developed an algorithm to identify DFKs, which serve as a sophisticated ‘background filter’, enhancing the accuracy of mRNA quantification. This dual strategy of an expanded region of interest coupled with the use of DFKs enhances the precision in quantifying both mature and nascent mRNA molecules, as well as in delineating reads of ambiguous status.

submitted by: Istvan Albert

kmerDB: A Database Encompassing the Set of Genomic and Proteomic Sequence Information for Each Species | bioRxiv (www.biorxiv.org)

In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 45,785 and 22,386 reference genomes and proteomes, respectively, as well as 14,658,776 and 149,264,442 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences that are absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.

submitted by: Istvan Albert

Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription

herald • 825 views

ADD COMMENT • link 17 months ago by Biostar 3.6k