Herald:The Biostar Herald for Monday, April 10, 2023
7 weeks ago
Biostar

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, Pavel, and was edited by Istvan Albert,

Correcting PCR amplification errors in unique molecular identifiers to generate absolute numbers of sequencing molecules. | bioRxiv (www.biorxiv.org)

Unique Molecular Identifiers (UMIs) are random oligonucleotide sequences that remove PCR amplification biases. However, the impact that PCR associated sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We show that PCR errors are the main source of inaccuracy in both bulk and single-cell sequencing data, and synthesizing UMIs using homotrimeric nucleotide blocks provides an error correcting solution, that allows absolute counting of sequenced molecules.

submitted by: Istvan Albert

Pooling animal samples to a lower number of replicates vs. sequencing a subgroup of the animal (www.biostars.org)

I found this question quite interesting, so I seem to have gone off the deep-end a bit investigating (perhaps I just didn't want to do my real work today!).

My initial thought is that pooling samples would artificially reduce the estimate of biological variance and would, therefore would, cause an increase in the false positive rate.

submitted by: Istvan Albert

On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments | BMC Genomics | Full Text (bmcgenomics.biomedcentral.com)

For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power.

submitted by: Istvan Albert


We present IDseq, an open source cloud-based metagenomics pipeline and service for global pathogen detection and monitoring (https://idseq.net). The IDseq Portal accepts raw mNGS data, performs host and quality filtration steps, then executes an assembly-based alignment pipeline, which results in the assignment of reads and contigs to taxonomic categories. The taxonomic relative abundances are reported and visualized in an easy-to-use web application to facilitate data interpretation and hypothesis generation.

submitted by: Istvan Albert

GitHub - gatk-workflows/gatk4-data-processing: Workflows for processing high-throughput sequencing data for variant discovery with GATK4 and related tools (github.com)

The processing-for-variant-discovery-gatk4 WDL pipeline implements data pre-processing according to the GATK Best Practices. The workflow takes as input an unmapped BAM list file (text file containing paths to unmapped bam files) to perform preprocessing tasks such as mapping, marking duplicates, and base recalibration. It produces a single BAM file and its index suitable for variant discovery analysis using tools such as Haplotypecaller.

submitted by: Istvan Albert

GRAPE: genomic relatedness detection pipeline | F1000Research (f1000research.com)

GRAPE is a free open-source pipeline for relatedness detection in genomic data, that is fast, reliable and accurate for both close and distant degrees of kinship, combines all the necessary processing steps to work on real data. https://github.com/genxnetwork/grape

submitted by: Pavel

