6 weeks ago
Biostar 850

The Biostar Herald publishes user submitted links of bioinformatics relevance.

This edition of the Herald was brought to you by contribution from GenoMax, Istvan Albert, Wayne, and was edited by Istvan Albert,

Single Cell Genomics Day | Satija Lab (satijalab.org)

Single Cell Genomics Day: A (Virtual) Practical Workshop

submitted by: Istvan Albert

Home | Gosling (gosling-lang.org)

A Grammar-based Toolkit for Scalable and Interactive Genomics Data Visualization

submitted by: Istvan Albert

GitHub - malonge/RagTag: Tools for fast and flexible genome assembly scaffolding and improvement (github.com)

RagTag is a collection of software tools for scaffolding and improving modern genome assemblies. Tasks include:

submitted by: Istvan Albert

High Performance Long Read Assay Enables Contiguous Data up to 10Kb on Existing Illumina Platforms (www.illumina.com)

Illumina long reads. Not 100-200 kb but still.

submitted by: GenoMax

RNA-seq | Griffith Lab (rnabio.org)

We have therefore developed this course to provide an introduction to RNA-seq and scRNA-seq data analysis concepts followed by integrated tutorials demonstrating the use of popular bioinformatics analysis packages. The tutorials are designed as self-contained units that include example data (Illumina paired-end RNA-seq data) and detailed instructions for installation of all required bioinformatics tools (HISAT, StringTie, Kallisto, etc.).

submitted by: Istvan Albert

Describes the new features of ffq (Fetch FastQ), which is a command line tool for finding sequencing data and metadata from public databases like SRA / GEO / EMBL-EBI/ / NCBI/ NIH Biosample/ DDBJ/ ENCODE.

ffq is specifically designed to download metadata and to facilitate obtaining links to sequence files. To download raw data from the links obtained with ffq you can use one of the following:

  • cURL and wget for FTP links,
  • aws for AWS links,
  • gsutil for GCP links,
  • fasterq dump for converting SRA files to FASTQ files.

Related to that, part of the current posts illustrates how the JSON structure makes it easy to load the results into a Pandas datafraeme, with one column being the URL.

ffq aims to make reproducible analysis straightforward by making piping and parsing convenient.

submitted by: Wayne

Journal of Open Source Software: ROCK: digital normalization of whole genome sequencing data (joss.theoj.org)

Due to advances in high-throughput sequencing technologies, generating whole genome sequencing (WGS) data with high coverage depth (e.g. ≥ 500×) is now becoming common, especially when dealing with non-eukaryotic genomes. Such high coverage WGS data often fulfills the expectation that most nucleotide positions of the genome are sequenced a sufficient number of times without error. However, performing bioinformatic analyses (e.g. sequencing error correction, whole genome de novo assembly) on such highly redundant data requires substantial running times and memory footprint

submitted by: Istvan Albert

