Question: GATK Workflow for GATK in Cancer Samples
2.2 years ago
United States
I am just starting to learn to use bioinformatics tools. My university has a limited and expensive bioinformatics team, so I'm mostly on my own except for big questions.

I am planning to use GATK to run 58 cancer control/normal pairs of Exome sequencing data (Illumina) from FASTQ or BAM file format, through the pipeline, with an output of a VCF & MAF format for analysis.

The current GATK pipeline is used for disease but not cancer, so I was wondering if anyone knew if there should be changes made for cancer. Here's the current pipeline starting with BAM files:

  • (Non-GATK) Picard Mark Duplicates or Samtools roundup
  • Indel Realignment (Realigner TargetCreator + Indel Realigner)
  • Base Quality Score Reacalibration (Base Recalibrator + PrintReads)
  • HaplotypeCaller <- I've been told this is for germline variants; what can I use for somatic variants?
  • VQSR (VariantRecalibrator and ApplyRecalibrator in SNP and INDEL mode)
  • Annotation using Oncotator (?)

I'd like some verification that this pipeline will output what I need to run my samples on MuTect, MutSig, or some other analysis program. I appreciate any advice.

Crossposted on Stack Exchange Biology.

2.2 years ago
Cambridge, MA
We (GATK docs team) are working on some docs for the somatic variant calling use case. In a nutshell, you'll need to do an additional pre-processing step called co-cleaning where you perform indel realignment on the tumor and normal in a pair together, use ContEst to estimate cross-sample contamination, use MuTect to call variants (not HC, which is not able to call low-AF variants like MuTect), do some manual filtering and processing to eliminate artifacts (VQSR is not appropriate for somatic calls) and finally annotate with Oncotator. 

We're happy to provide more details in the GATK support forum. The CGA homepage ( is also a good resource.

"pre-processing step called co-cleaning where you perform indel realignment on the tumor and normal in a pair together,"

I am confused as since we need have separate intervals for both Tumor and sample then how we will feed them one for Co-Realignment. Do you merged them in one file ? I tried but cant give both files as input.

Since this has been posted, this is the only additional information I've found. Be sure to read the comments, as there's some good info there!

If there's an updated documentation, please post a link here.

21 months ago
United States/Baltimore/Johns Hopkins University
For your annotation step, you might want to consider using  CRAVAT. Here's a post explaining the web tool. The CHASM scores from CRAVAT in particular would probably be of interest to you!  Also, CRAVAT will be releasing a new graphical interactive explorer tool in a couple of weeks and we will post details here when it's up.  

