Hi, friends I have finished call variants via GTAK best practices(joint-call) with 20 WES samples. My goal is to find the genes that may causes the disease from these 20 samples, I also tried some tools and methods, but I don't whether they are rigth. So, Is there a good guidance documentation like GTAK best practices to get my target?
I have tried an annotation tool named annova,
First, I transform vcf into the annova input format, more precisely, I got 20 annova input file for each sample.
convert2annovar.pl -format vcf4 relapse.filtered.snps.indels.vcf -allsample -filter PASS -out out/relapse
For each sample, I did the following:
1.filtering the irrelevant variants via 1000 Genomes Project dataset with MAF=0.01
2.annotating each variants with gene info, due to the data is WES, I got the "exonic variantfunction"
intronic        KCNMA1  chr10   77008214        77008214        T       C       het     127.30  51
exonic  CDHR4   chr3    49795287        49795287        C       T       het     8460.17 274
exonic  CLCN5   chrX    50081733        50081733        A       G       hom     69769.27        255
intronic        TUFT1   chr1    151566031       151566031       -       C       hom     7453.64 18
intronic        PDE4D   chr5    60147674        60147674        T       G       het     1524.08 112
intronic        USF1    chr1    161041947       161041948       GA      -       het     2536.97 91
intronic        RPS6KB2 chr11   67432552        67432552        A       G       het     2772.98 156
intergenic      LINC01296(dist=73295),DUXAP10(dist=113998)      chr14   19180792        19180792        C       A       het     2378.33 172
3.Combining there 20 output file with .MAF format to generate a waterfall plot, however, the percent of mutant is almost 100%, I don't think that is a right result.

You should consider changing your question to the actual problem you are having, which is "find the genes that may causes the disease". GATK helps you get mutations, but it will not help you with understanding them.
Thanks for your reply, I added a description of the problem.
You mean GATK, right?
Yes, I got a vcf with gatk_v4.1.0.0, here is part of the output.
but I have no ideal what should I do to find these disease genes. is there a common practice ?
What have you tried?
Thanks for your reply, I added a description of the problem ,and I tried an annotation tool named 'annova'