Tutorial: introduce a tool for fast identification of SARS-CoV-2 and other microbes from sequencing data
7
gravatar for chen
9 weeks ago by
chen2.0k
OpenGene
chen2.0k wrote:

NEW: fastv 0.6.0 released, which supports scanning a KMER collection file that contains KMERs of many microorganisms.

fastv is an ultra-fast tool for identification of SARS-CoV-2 and other microbes from sequencing data. It detects microbial sequences from FASTQ data, generates JSON reports and visualizes the result in HTML reports. This tool can be used to detect viral infectious diseases, like COVID-19. This tool supports both short reads (Illumina, BGI, etc.) and long reads (ONT, PacBio, etc.)

fastv is an OpenGene project: https://github.com/OpenGene/fastv

how it works?

fastv accepts FASTQ files as input, and then:

  1. performs data QC and quality filtering as fastp does (cut adapters, remove low quality reads, correct wrong bases).
  2. scans the clean data to collect the sequences that containing any unique KMER, or can be mapped to any reference microbial genomes.
  3. make statistics, visualize the result in HTML format, and output the results in JSON format.
  4. output the on-target sequencing reads so that they can be analyzed by downstream tools.

understand the input

fastv accepts following files as input:

  1. (required) the FASTQ file to be scanned, can be single-end (-i) or paired-end (-i and -I), can be short reads (Illumina, MGI, etc.) or long reads (PacBio, ONT, etc.)
  2. (optional) the Genomes file: a FASTA file containing one or many reference genomes of the target microorganism (-g).
  3. (optional) the KMER file: a FASTA file containing the UNIQUE KMER of the target microbial genomes (-k).
  4. (optional) the KMER Collection file: a FASTA containing the unique KMERs of many microorganisms (-c). See an example: http://opengene.org/kmer_collection.fasta

If none of (KMER, KMER Collection, Genomes) files is specified, fastv will try to load the SARS-CoV-2 Genomes/KMER files in the data folder to detect SARS-CoV-2

take a quick glance of the informative report

try fastv to generate above reports

quick examples

Single-end data

./fastv -i testdata.fq.gz

Paired-end data

./fastv -i R1.fq.gz -I R2.fq.gz

You can download KMER files and Genome files of viruses from http://opengene.org/uniquekmer/virus/index.html. This is generated by extracting unique KMERs for all genomes in a big FASTA (http://opengene.org/viral.genomic.fasta), which contains all NCBI complete RefSeq release of viral sequences that can be found from https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/. The KMERs that can be mapped to human reference genome (GRCh38) with edit_distance <= 3 have already been filtered out.

You can download the KMER collection file for viral genomes from: http://opengene.org/virus.kmer_collection.fasta.gz

If you want to generate your own unique KMER files and KMER collection files, please use UniqueKMER: https://github.com/OpenGene/UniqueKMER

screenshot

image

For more information, go: https://github.com/OpenGene/fastv

ADD COMMENTlink modified 4 weeks ago • written 9 weeks ago by chen2.0k

Wow great information. COVID-19

ADD REPLYlink written 7 weeks ago by Prakash Das0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2083 users visited in the last hour