The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,
STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci | Genome Biology | Full Text (genomebiology.biomedcentral.com)
Expansions of short tandem repeats (STRs) cause many rare diseases. [...] We developed STRling to efficiently count k-mers to recover informative reads and call expansions at known and novel STR loci. [...] It is fast, scalable, open-source, and available at: github.com/quinlan-lab/STRling.
submitted by: Istvan Albert
Navigating bottlenecks and trade-offs in genomic data analysis | Nature Reviews Genetics (www.nature.com)
Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance.
submitted by: Istvan Albert
PyDESeq2: a python package for bulk RNA-seq differential expression analysis | bioRxiv (www.biorxiv.org)
We present PyDESeq2, a python implementation of the DESeq2 workflow for differential expression analysis on bulk RNA-seq data. This implementation achieves better precision, allows speed improvements on large datasets, as shown in experiments on TCGA data, and can be more easily interfaced with modern python-based data science tools.
submitted by: Istvan Albert
GitHub - vembrane/vembrane: vembrane filters VCF records using python expressions (github.com)
vembrane allows to simultaneously filter variants based on any INFO or FORMAT field, CHROM, POS, ID, REF, ALT, QUAL, FILTER, and the annotation field ANN. When filtering based on ANN, annotation entries are filtered first. If no annotation entry remains, the entire variant is deleted.
vembrane relies on pysam for reading/writing VCF/BCF files.
submitted by: Istvan Albert
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac810/6909012?login=true
We present vembrane as a command line VCF/BCF filtering tool that consolidates and extends the filtering functionality of previous software to meet any imaginable filtering use case.
Some bold statements here: to meet any imaginable filtering use case
submitted by: Istvan Albert
GitHub - google/deepconsensus: DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data. (github.com)
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.
submitted by: Istvan Albert
Strobealign: flexible seed size enables ultra-fast and accurate read alignment | Genome Biology | Full Text (genomebiology.biomedcentral.com)
We combine two such methods, syncmers and strobemers, in a novel seeding approach for constructing dynamic-sized fuzzy seeds and implement the method in a short-read aligner,
strobealign
.
strobealign
is several times faster than traditional aligners at similar and sometimes higher accuracy while being both faster and more accurate than more recently proposed aligners for short reads of lengths 150nt and longer. Availability:
submitted by: Istvan Albert
Excited to launch Quibbler, an open-source Python package for interactive data analysis. Fun to use. Nothing to learn. Your standard code effortlessly comes to life! With the amazing Maor Kern, @kmaork. https://t.co/u7znAYBfYA, #python, #data pic.twitter.com/BUd7JzJbF4
— Roy Kishony (@RoyKishony) December 12, 2022
Excited to launch Quibbler, an open-source Python package for interactive data analysis. Fun to use. Nothing to learn. Your standard code effortlessly comes to life! With the amazing Maor Kern, @kmaork. https://t.co/u7znAYBfYA, #python, #data pic.twitter.com/BUd7JzJbF4
— Roy Kishony (@RoyKishony) December 12, 2022submitted by: Istvan Albert
how_are_we_stranded_here: quick determination of RNA-Seq strandedness | BMC Bioinformatics | Full Text (bmcbioinformatics.biomedcentral.com)
How do you know that terminology (in this case, stranded RNA-seq sequencing) and metadata are hopelessly convoluted? You need software to tell you how a data was created ...
https://github.com/signalbash/how_are_we_stranded_here
The desperation is evident in the tool name: how_are_we_stranded_here"that reads as if the authors were trying to atone for the sins of bioinformatics nomenclature.
I happen to recall a similar tool named "Guess My Library Type" written in the exact same spirit many years prior:
https://github.com/NBISweden/GUESSmyLT
And how does one guess their library type? The first step: install docker ....
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription