I am working on a matched tumor-normal somatic variant calling pipeline. My pipeline is as below:
- bwa mem alignment
- sort and deduplicate with samtools and picard
- Realignment with GATK
- Base recalibration with GATK
- Somatic mutation calling with Mutect / Varscan2 (For MuTect, I try both 1.1.7 and 2, with default setting. For VarScan2, I filter reads with mapping quality 20 and use processSomatic to pick out high-confidence calls)
Here are my questions:
When I filtered out the variants in introns/UTR/ncRNA, there are very little of intersection between Mutect/VarScan hit. The intersection between Mutect and Mutect2 is also very low. I am aware of the fact that the false positive rate is very high in current somatic mutation calling tools, but is there a way (a combination of parameter setting) that can filter out most of noises? (I know MuTect2 gives INDEL calling while old ones don't.)
I try to find a gold standard reference for whole exome sequencing. But what I've found so far are some articles using NA12878, simulating tumor mutation based on normal sample. Is there any reference I can use to evaluate my pipeline?
COLO829 is another candidate for me to use as reference. Since it is a genome sequencing sample, would it be ideal to use it as a reference standard, by using the exonic intervals?
I am still a novice in WES. Any reply would be greatly appreciated.