To better compare de novo assemblers for metagenomic analysis, LMAS (Last Metagenomic Assembler Standing) was developed as a flexible platform allowing users to evaluate assembler performance given known standard communities. [...] Some assemblers still in use, such as ABySS, MetaHipmer2, minia, and VelvetOptimiser, perform relatively poorly and should be used with caution when assembling complex samples [...] No single assembler appeared an ideal choice for short-read metagenomic prokaryote replicon assembly, each showing specific strengths.

submitted by: Istvan Albert

RNA velocity unraveled | PLOS Computational Biology (journals.plos.org)

Follow-up on the RNA modeling article https://www.nature.com/articles/s41467-022-34857-7 linked in the current biostar herald; covering models for RNA velocity

submitted by: LChart

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure | bioRxiv (www.biorxiv.org)

he construction of the new CHESS 3 database employed improved transcript assembly algorithms, a new machine learning classifier, and protein structure predictions to identify genes and transcripts likely to be functional and to eliminate those that appeared more likely to represent noise. The new catalog contains 41,356 genes on the GRCh38 reference human genome, of which 19,839 are protein-coding, and a total of 158,377 transcripts. These include 14,863 novel protein-coding transcripts. The total number of transcripts is substantially smaller than earlier versions due to improved transcriptome assembly methods and to a stricter protocol for filtering out noisy transcripts. Notably, CHESS 3 contains all of the transcripts in the MANE database, and at least one transcript corresponding to the vast majority of protein-coding genes in the RefSeq and GENCODE databases. CHESS 3 has also been mapped onto the complete CHM13 human genome, which gives a more-complete gene count of 43,773 genes and 19,968 protein-coding genes. The CHESS database is available at http://ccb.jhu.edu/chess.

submitted by: Istvan Albert

GitHub - ekimb/mapquik: Efficient low-divergence mapping of long reads in minimizer space (github.com)

mapquik is an ultra-fast read mapper based on -min-mers (matches of consecutively-sampled minimizers). It aligns long and accurate reads such as PacBio HiFi to a reference genome.

submitted by: Istvan Albert

Ultrafast prediction of somatic structural variations by filtering out reads matched to pan-genome k-mer sets | Nature Biomedical Engineering (www.nature.com)

Here we describe an ultrafast and accurate detector of somatic structural variations that reduces read-mapping costs by filtering out reads matched to pan-genome k-mer sets.

When benchmarked against six callers on reference cell-free DNA, validated biomarkers of structural variants, matched tumour and normal whole genomes, and tumour-only targeted sequencing datasets, ETCHING was 11-fold faster than the second-fastest structural-variant caller at comparable performance and memory use.

submitted by: Istvan Albert

