Metagenomic identification of viral sequences in laboratory reagents | bioRxiv (www.biorxiv.org)

These data suggest that the contamination of common laboratory reagents is likely widespread and can comprise a wide variety of viruses.

Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays | BMC Bioinformatics | Full Text (bmcbioinformatics.biomedcentral.com)

Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples

Good enough practices in scientific computing (journals.plos.org)

This paper presents a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill. These practices, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts.

PRINCESS: comprehensive detection of haplotype resolved SNVs, SVs, and methylation | Genome Biology | Full Text (genomebiology.biomedcentral.com)

PRINCESS is a structured workflow that takes raw sequence reads and generates a fully phased SNV, SV, and methylation call set within a few hours. PRINCESS achieves high accuracy and long phasing even on low coverage datasets and can resolve repetitive, complex medical relevant genes that often escape detection

DRAGEN reanalysis of the 1000 Genomes Dataset now available on the Registry of Open Data | AWS for Industries (aws.amazon.com)

This release (1kGP-DRAGEN) includes 2,504 unrelated samples from the 1000 Genomes Project phase 3 as well as an additional 698 related samples that complete 535 mother-father-child triads, funded by the NHGRI. The samples were all sequenced at >30x coverage using the Illumina NovaSeq 6000 system with 2x150bp reads. All 3,202 samples were re-realigned to hg38 using Illumina DRAGEN v3.5.7b, powered by the Illumina Analytics Platform (IAP) and AWS.

CRAM 3.1: Advances in the CRAM Format | bioRxiv (www.biorxiv.org)

CRAM has established itself as a high compression alternative to the BAM file format for DNA sequencing data. We describe updates to further improve this on modern sequencing instruments. Results: With Illumina data CRAM 3.1 is 7 to 15% smaller than the equivalent CRAM 3.0 file, and 50 to 70% smaller than the corresponding BAM file.

Biology must generate ideas as well as data (www.nature.com)

Data should be a means to knowledge, not an end in themselves.

