The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here.
This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan Albert,
My colleague Jonn unintentionally coined the word “bioinformortician” this morning and I feel like it should be a real #bioinformatics specialty focused on identifying and laying to rest bad data
— Geraldine Van der Auwera 🏳️🌈 (@VdaGeraldine) September 11, 2021
My colleague Jonn unintentionally coined the word “bioinformortician” this morning and I feel like it should be a real #bioinformatics specialty focused on identifying and laying to rest bad data
— Geraldine Van der Auwera 🏳️🌈 (@VdaGeraldine) September 11, 2021submitted by: Istvan Albert
Metagenomic identification of viral sequences in laboratory reagents | bioRxiv (www.biorxiv.org)
These data suggest that the contamination of common laboratory reagents is likely widespread and can comprise a wide variety of viruses.
submitted by: Istvan Albert
Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays | BMC Bioinformatics | Full Text (bmcbioinformatics.biomedcentral.com)
Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples
submitted by: Istvan Albert
Good enough practices in scientific computing (journals.plos.org)
This paper presents a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill. These practices, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts.
submitted by: Istvan Albert
PRINCESS: comprehensive detection of haplotype resolved SNVs, SVs, and methylation | Genome Biology | Full Text (genomebiology.biomedcentral.com)
PRINCESS is a structured workflow that takes raw sequence reads and generates a fully phased SNV, SV, and methylation call set within a few hours. PRINCESS achieves high accuracy and long phasing even on low coverage datasets and can resolve repetitive, complex medical relevant genes that often escape detection
submitted by: Istvan Albert
DRAGEN reanalysis of the 1000 Genomes Dataset now available on the Registry of Open Data | AWS for Industries (aws.amazon.com)
This release (1kGP-DRAGEN) includes 2,504 unrelated samples from the 1000 Genomes Project phase 3 as well as an additional 698 related samples that complete 535 mother-father-child triads, funded by the NHGRI. The samples were all sequenced at >30x coverage using the Illumina NovaSeq 6000 system with 2x150bp reads. All 3,202 samples were re-realigned to hg38 using Illumina DRAGEN v3.5.7b, powered by the Illumina Analytics Platform (IAP) and AWS.
submitted by: Istvan Albert
CRAM 3.1: Advances in the CRAM Format | bioRxiv (www.biorxiv.org)
CRAM has established itself as a high compression alternative to the BAM file format for DNA sequencing data. We describe updates to further improve this on modern sequencing instruments. Results: With Illumina data CRAM 3.1 is 7 to 15% smaller than the equivalent CRAM 3.0 file, and 50 to 70% smaller than the corresponding BAM file.
submitted by: Istvan Albert
Biology must generate ideas as well as data (www.nature.com)
Data should be a means to knowledge, not an end in themselves.
submitted by: Istvan Albert
Want to get the Biostar Herald in your email? Who wouldn't? Sign up righ'ere: toggle subscription