I have two different samples, say WT (wildtype) and A6 (Crispr mutated). (samples are illumina sequenced target panel, which includes 125 gene ) I want to to find the variants in A6 with respect to WT (considering WT as reference). I did following steps:
step1: Aligned the WT FASTQ files to hg19 using BWA-mem
step2: Called variants in WT using GATK-haplotype caller.
step3: put the list of called variants in hg19 to generated the consensus sequence (used GATK -FastaAlternateReferenceMaker) (I have checked for heterogeneous biallelic mutations, they were very few, so I did not do anything different for them, and let the tool decide which one to select.)
step4: using the consensus sequence as reference sequence, aligned both the samples A6 & WT (used WT to check if there are still some variants being called) and called variants (used BWA & GATK)
Results:There are still so many variants called in WT, also there is a large overlap in the variants from WT & A6.
Can any one suggest any modification in strategy or something else?