The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,
Which programming language should I use? A guide for early-career researchers (www.nature.com)
Computer scientists and bioinformaticians address four key questions to help rookie coders to make the right choice.
submitted by: Istvan Albert
GitHub - ArcInstitute/xsra: An efficient CLI to extract sequences from the SRA (github.com)
A performant and storage-efficient CLI tool to extract sequences from an SRA archive with support for FASTA, FASTQ, and BINSEQ outputs.
submitted by: Istvan Albert
Why is bwa-aln used for ancient DNA reads? (lh3.github.io)
[...] plot suggests for reads shorter than 60bp, bwa-aln is more sensitive to mutations or sequencing/deamination errors than bwa-mem and bowtie2. Given that aDNA reads are short, the higher sensitivity at the short end will help to alleviate reference biases and improve variant calling. This is why bwa-aln is still used for aDNA data.
submitted by: Istvan Albert
Big names or big ideas: Do peer-review panels select the best science proposals? (www.science.org)
A key issue in the economics of science is finding effective mechanisms for innovation. A concern about research grants and other research and development subsidies is that the public sector may make poor decisions about which projects to fund. Despite its importance, especially for the advancement of basic and early-stage science, there is currently no large-scale empirical evidence on how successfully governments select research investments. Li and Agha analyze more than 130,000 grants funded by the U.S. National Institutes of Health during 1980–2008 and find clear benefits of peer evaluations, particularly for distinguishing high-impact potential among the most competitive applications.
submitted by: Istvan Albert
Assemblies of long-read metagenomes suffer from diverse errors | bioRxiv (www.biorxiv.org)
Genomes from metagenomes have revolutionised our understanding of microbial diversity, ecology, and evolution, propelling advances in basic science, biomedicine, and biotechnology. Assembly algorithms that take advantage of increasingly available long-read sequencing technologies bring the recovery of complete genomes directly from metagenomes within reach. However, assessing the accuracy of the assembled long reads, especially from complex environments that often include poorly studied organisms, poses remarkable challenges. Here we show that erroneous reporting is pervasive among long-read assemblers and can take many forms, including multi-domain chimeras, prematurely circularized sequences, haplotyping errors, excessive repeats, and phantom sequences. Our study highlights the need for rigorous evaluation of the algorithms while they are in development, and options for users who may opt for more accurate reads than shorter runtimes.
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription
BinSeq format from ArcInstitute also ! https://github.com/ArcInstitute/binseq ...the trio of binseq+xsra+https://github.com/ArcInstitute/SRAgent is probably a killer combo for massive data ingestion!