Herald:The Biostar Herald for Monday, November 20, 2023
16 days ago
The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.

This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,

GitHub - hasindu2008/slow5lib: slow5lib is a software library for reading & writing SLOW5 files. (github.com)

SLOW5 is a new file format for storing signal data from Oxford Nanopore Technologies (ONT) devices. SLOW5 was developed to overcome inherent limitations in the standard FAST5 signal data format. SLOW5 can be encoded in human-readable ASCII format, or a more compact and efficient binary format (BLOW5).

Gothca, from FAST5 we go to SLOW5 and BLOW5 no seriously - is this madness? no this is Bioinformaaaatics.

scReadSim: a single-cell RNA-seq and ATAC-seq read simulator | Nature Communications (www.nature.com)

We introduce scReadSim, a single-cell RNA-seq and ATAC-seq read simulator that allows user-specified ground truths and generates synthetic sequencing reads (in a FASTQ or BAM file) by mimicking real data. At both read-sequence and read-count levels, scReadSim mimics real scRNA-seq and scATAC-seq data. Moreover, scReadSim provides ground truths, including unique molecular identifier (UMI) counts for scRNA-seq and open chromatin regions for scATAC-seq.

Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification | bioRxiv (www.biorxiv.org)

Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database.

GitHub - mourisl/centrifuger: Classifier for metagenomic sequences using FM-index with run-block compressed BWT. (github.com)

Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. It implemented a novel lossless compression method, run-block comprssed BWT, and other strategies to efficiently reduce the size of the microbial genome database like RefSeq. For example, Centrifuger can classify reads against the 2023 RefSeq prokaryotic genomes containing about 140G nucleotides using 43 GB memory. Despite running on a compressed data structure, Centrifuger is also highly efficient and can process a typical sequencing sample within an hour.

GitHub - vcflib/vcflib: C++ library and cmdline tools for parsing and manipulating VCF files with python and zig bindings (github.com)

This is vcflib's first Humpty Dumpty release: vcfcreatemulti is the natural companion to vcfwave. Often variant callers are not perfect. vcfwave with its companion tool vcfcreatemulti can take an existing VCF file that contains multiple complex overlapping and even nested alleles and, unlike Humpty Dumpty, take them apart and put them together again. Thereby, hopefully, creating sane VCF output that is useful for analysis and getting rid of false positives.

A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar | PLOS Computational Biology (journals.plos.org)

Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices.

