I am particularly using GATK 2.3.4 for calling the variants on my samples. My idea of the experimental design is to call the variants for each samples, normal, tumor and its corresponding IPS(induced pluripotent stem cells derived from tumor) and then remove the mutations common between normal/tumor and normal/IPS to extract only exclusive mutations in the tumor and its IPS and then match the mutations between the exclusive tumor and its exclusive IPS to infer that the IPS derived from its tumor indeed harbors majority of the mutation that was in its tumor. This will give me an idea to establish the fact that the IPS is derived from its tumor and the genetic background in both of them are maintained.
I followed a standard GATK pipeline but to this am getting a overlap of 40% mutation between IPS and its tumor. I want to know is there any way I can call the variants in GATK using allelic frequency?
Lets say when am trying to call the variants first time for each samples I will consider only those mutations for which the ALT allele read is above 20%, which means the mutational frequency is above 20%. This can be done by AD values where the AD(20,31) refers to unfiltered reads for a SNP for which the REF is having 20 reads and ALT is 31, So the frequency of mutation will be Mut_freq= ALT/(REF+ALT) = 31/51=.60. This can be a viable mutation call based on the number of times the ALT allele is called. I want to discard all the mutation calls for SNP which has this frequency less thant .2 or preferably I would say less than 20%.
Can this criteria be done in the GATK pipeline? If so then how?
It should be done before the VQSR step using Unified Genotyper right or after the VQSR? If I do it after the VQSR then am already using the stringency of finding the true and false positives in my SNP calls, so Ideally this should be done before VQSR step. I have found in GATK docs only the use of this kind of filtering after the VQSR step before annotation of the SNP using the JEXL expression but am not so sure if it will serve my purpose or not.
Any suggestions how to proceed for this kind of model will be much appreciated, preferably with the command line how to use this expression of mutational calling using GATK. I know somehow this can be done using VarScan but am not into VarScan as of now, trying to do this with GATK, please suggest.