Question: Gold Standard for Human cancer exome sequencing
2.5 years ago by
Jerome Lin20
University of Pittsburgh
Jerome Lin20 wrote:

Hi all.

I am working on a matched tumor-normal somatic variant calling pipeline. My pipeline is as below:

  1. bwa mem alignment
  2. sort and deduplicate with samtools and picard
  3. Realignment with GATK
  4. Base recalibration with GATK
  5. Somatic mutation calling with Mutect / Varscan2 (For MuTect, I try both 1.1.7 and 2, with default setting. For VarScan2, I filter reads with mapping quality 20 and use processSomatic to pick out high-confidence calls)

Here are my questions:

  1. When I filtered out the variants in introns/UTR/ncRNA, there are very little of intersection between Mutect/VarScan hit. The intersection between Mutect and Mutect2 is also very low. I am aware of the fact that the false positive rate is very high in current somatic mutation calling tools, but is there a way (a combination of parameter setting) that can filter out most of noises? (I know MuTect2 gives INDEL calling while old ones don't.)

  2. I try to find a gold standard reference for whole exome sequencing. But what I've found so far are some articles using NA12878, simulating tumor mutation based on normal sample. Is there any reference I can use to evaluate my pipeline?

  3. COLO829 is another candidate for me to use as reference. Since it is a genome sequencing sample, would it be ideal to use it as a reference standard, by using the exonic intervals?

I am still a novice in WES. Any reply would be greatly appreciated.


cancer mutect varscan wes somatic • 1.5k views
ADD COMMENTlink modified 19 days ago by Biostar ♦♦ 20 • written 2.5 years ago by Jerome Lin20
2.5 years ago by
United States
harold.smith.tarheel4.3k wrote:

The DREAM challenge consortium has generated synthetic cancer data sets for benchmarking (whole genome, but you could easily filter to WES after alignment). Brad Chapman et al. at Blue Collar Bioinformatics have validated a lot of mutation-calling tools against this data (see here).

written 2.5 years ago by harold.smith.tarheel4.3k
