About 5 months ago, I firstly introduced a pipeline for virus identification which was under developing. (C: Shall we perform a bio-experiment like PCR to confirm bioinformatics analysis?) Now core parts for the pipeline have been achieved, We put 2 sets of results of the pipeline online using biological NGS datasets generated from 454 and hiseq. We are looking forward to your reply.
If you have any questions or suggestions, please donot hesitate to contact me email@example.com
The followings are basic introduction for this pipeline:
Abstract: Next-generation sequencing (NGS) approaches capacitate wide spectrum of viral pathogen detection from the clinical samples. However, practical application of the technology is limited by bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe VIP (“Virus Identification Pipeline”), a computational pipeline for viral pathogen identification from clinical metagenomic NGS data. VIP performs the following steps to achieve virus identification: (i) map and filter out background-related reads, (ii) extensive classification of reads using Bowtie2 and RAPSearch, (iii) de novo assembly of candidate reads and following perform phylogenetic analysis to provide evolutionary insight. We demonstrate use of the pipeline in the analysis of clinical samples from Guangzhou outbreak of dengue fever and public data sets comprising more than 1 billion sequences. VIP has also contributed to virus discovery, demonstrating its potential feasibility as a generic tool.
VIP flowchart: Raw NGS reads are firstly preprocessed by removal of adapter, low-quality, and low-complexity sequences, followed by computational subtraction of human reads using Bowtie2. In fast mode, viruses are identified by Bowtie2 alignment to ViPR/IRD nucleotide DB. In sense mode, bacteria reads are removed and the remaining reads are aligned to virus database. Unmatched reads and contigs generated from de novo assembly are then aligned to a viral protein database (VIPR/IRD collection) using RAPSearch. The largest contig is subject to a backbone which is constructed by sequences with Refseq standards using MAFFT. VIP reports include a summary table of classified reads with taxonomic assignments and genomic coverage percentage. In addition, results of phylogenetic analysis and genomic coverage map are attached.
One touch command (sh VIP.sh <NGS file> <platform>)
Cross-platform supported (454, Ion torrent,Illumina)
Low hardware requirements (8GB memory is OK)