Tutorial: introduce a tool for fast identification of SARS-CoV-2 and other microbes from sequencing data
9
gravatar for chen
6 months ago by
chen2.1k
OpenGene
chen2.1k wrote:

These tools and resources have been published in Briefings in Bioinformatics: https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbaa231/5917007

fastv is an ultra-fast tool for identification of SARS-CoV-2 and other microbes from sequencing data. It detects microbial sequences from FASTQ data, generates JSON reports and visualizes the result in HTML reports. This tool can be used to detect viral infectious diseases, like COVID-19. This tool supports both short reads (Illumina, BGI, etc.) and long reads (ONT, PacBio, etc.)

fastv is an OpenGene project: https://github.com/OpenGene/fastv

how it works?

fastv accepts FASTQ files as input, and then:

  1. performs data QC and quality filtering as fastp does (cut adapters, remove low quality reads, correct wrong bases).
  2. scans the clean data to collect the sequences that containing any unique KMER, or can be mapped to any reference microbial genomes.
  3. make statistics, visualize the result in HTML format, and output the results in JSON format.
  4. output the on-target sequencing reads so that they can be analyzed by downstream tools.

understand the input

fastv accepts following files as input:

  1. (required) the FASTQ file to be scanned, can be single-end (-i) or paired-end (-i and -I), can be short reads (Illumina, MGI, etc.) or long reads (PacBio, ONT, etc.)
  2. (optional) the Genomes file: a FASTA file containing one or many reference genomes of the target microorganism (-g).
  3. (optional) the KMER file: a FASTA file containing the UNIQUE KMER of the target microbial genomes (-k).
  4. (optional) the KMER Collection file: a FASTA containing the unique KMERs of many microorganisms (-c). See an example: http://opengene.org/kmer_collection.fasta

If none of (KMER, KMER Collection, Genomes) files is specified, fastv will try to load the SARS-CoV-2 Genomes/KMER files in the data folder to detect SARS-CoV-2

take a quick glance of the informative report

try fastv to generate above reports

quick examples

Single-end data

./fastv -i testdata.fq.gz

Paired-end data

./fastv -i R1.fq.gz -I R2.fq.gz

You can download KMER files and Genome files of viruses from http://opengene.org/uniquekmer/virus/index.html. This is generated by extracting unique KMERs for all genomes in a big FASTA (http://opengene.org/viral.genomic.fasta), which contains all NCBI complete RefSeq release of viral sequences that can be found from https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/. The KMERs that can be mapped to human reference genome (GRCh38) with edit_distance <= 3 have already been filtered out.

You can download the KMER collection file for viral genomes from: http://opengene.org/virus.kmer_collection.fasta.gz

If you want to generate your own unique KMER files and KMER collection files, please use UniqueKMER: https://github.com/OpenGene/UniqueKMER

screenshot

image

For more information, go: https://github.com/OpenGene/fastv

ADD COMMENTlink modified 12 days ago • written 6 months ago by chen2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1980 users visited in the last hour