Dear all ,
I would like to put some queries regarding the filtering and the annotation step that is usually done to retrieve the causal variants and the extract the most important candidate genes that are likely to cause mutations in tumor samples. I have designed a exome sequencing data analysis pipeline after reading through the different pipeline that have been provided online and modifying them according to my experimental design and have been able to extract the variants which I want to annotate using annovar and filter out the potential mutated genes. The catch in my analysis is that I do not have any normal samples and so far my idea of experimental design is to analyze the exome data of tumor sample and the IPSC line derived from the same tumor. To this what I did is to analyze separately the tumor and its corresponding IPSC line and then annotate them separately with Annovar . The command I used for annovar is :
#### Annotations using annovar ### Conversion to annovar file format perl5.8.8 /data/PGP/exome/annovar/convert2annovar.pl /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.recal.snps.vcf -format vcf4 --outfile /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.recal.snps.vcf.annovar -includeinfo ######final annotation perl5.8.8 /data/PGP/exome/annovar30_01_2013/summarize_annovar.pl -veresp 6500 -ver1000g 1000g2012apr -buildver hg19 -verdbsnp 137 /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.recal.snps.vcf.annovar /data/PGP/exome/annovar30_01_2013/humandb -outfile /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999_snps -step 1-9
This I have also done for the IPSC line as well. Next I got a list of over 5000 mutated genes and I want to compare the tumor and its IPSC to check if the genetic landscape of both is still maintained or not. But I am a bit curious that the 5000 gene counts is too large and also since I have no normal samples so I cannot apply the subtraction method where I can do away with the mutations that are usually found in the normal sample with respect to the refgene. So I would like to ask if there is any protocol for filtering the non synonymous and synonymous SNV obtained from the annovar step to reduce the number of mutated genes to more potential candidates and then compare the tumor and its IPSCs. I see there is another program in Annovar variants_reduction.pl which can be used , does anyone have any idea of using this program or is there any standard filtering method which can be applied on the output obtained from the final annotation step as mentioned above? I can only see the AVSIFT scores and based on a ranking I can select the genes that are having below AVSIFT scores less than 0.05 and filter the genes. But does this idea sound good? I am not looking for any novel mutations as of now so I am not removing the variants that are found in dbSNP and 1000 Genomes. Again is there any way to check the MAF value in annovar and put a stringent parameter threshold and use such genes that are having MAF scores less than the thresholds? I would be thankful if anyone can share their experience with me during the annotation filtration process they have used while exome analysis. This filed is new to me so I might be wrong in some areas , please feel free to correct me and show me the right path. It would be nice if anyone can share any script for filtration post annovar usage or also the variant reduction program script with parameters that they follow.