Question: GATK Workflow for GATK in Cancer Samples
gravatar for gaiusjaugustus
2.2 years ago by
United States
gaiusjaugustus60 wrote:

I am just starting to learn to use bioinformatics tools. My university has a limited and expensive bioinformatics team, so I'm mostly on my own except for big questions.

I am planning to use GATK to run 58 cancer control/normal pairs of Exome sequencing data (Illumina) from FASTQ or BAM file format, through the pipeline, with an output of a VCF & MAF format for analysis.

The current GATK pipeline is used for disease but not cancer, so I was wondering if anyone knew if there should be changes made for cancer. Here's the current pipeline starting with BAM files:

  • (Non-GATK) Picard Mark Duplicates or Samtools roundup
  • Indel Realignment (Realigner TargetCreator + Indel Realigner)
  • Base Quality Score Reacalibration (Base Recalibrator + PrintReads)
  • HaplotypeCaller <- I've been told this is for germline variants; what can I use for somatic variants?
  • VQSR (VariantRecalibrator and ApplyRecalibrator in SNP and INDEL mode)
  • Annotation using Oncotator (?)

I'd like some verification that this pipeline will output what I need to run my samples on MuTect, MutSig, or some other analysis program. I appreciate any advice.

Crossposted on Stack Exchange Biology.

gatk exome • 2.8k views
ADD COMMENTlink modified 21 months ago by r0ntu20 • written 2.2 years ago by gaiusjaugustus60
gravatar for vdauwera
2.2 years ago by
Cambridge, MA
vdauwera560 wrote:

We (GATK docs team) are working on some docs for the somatic variant calling use case. In a nutshell, you'll need to do an additional pre-processing step called co-cleaning where you perform indel realignment on the tumor and normal in a pair together, use ContEst to estimate cross-sample contamination, use MuTect to call variants (not HC, which is not able to call low-AF variants like MuTect), do some manual filtering and processing to eliminate artifacts (VQSR is not appropriate for somatic calls) and finally annotate with Oncotator. 

We're happy to provide more details in the GATK support forum. The CGA homepage ( is also a good resource.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by vdauwera560

"pre-processing step called co-cleaning where you perform indel realignment on the tumor and normal in a pair together,"

I am confused as since we need have separate intervals for both Tumor and sample then how we will feed them one for Co-Realignment. Do you merged them in one file ? I tried but cant give both files as input.

ADD REPLYlink modified 11 months ago • written 11 months ago by always_learning780

Since this has been posted, this is the only additional information I've found. Be sure to read the comments, as there's some good info there!

If there's an updated documentation, please post a link here.

ADD REPLYlink written 16 months ago by gaiusjaugustus60
gravatar for r0ntu
21 months ago by
United States/Baltimore/Johns Hopkins University
r0ntu20 wrote:

For your annotation step, you might want to consider using  CRAVAT. Here's a post explaining the web tool. The CHASM scores from CRAVAT in particular would probably be of interest to you!  Also, CRAVAT will be releasing a new graphical interactive explorer tool in a couple of weeks and we will post details here when it's up.  

ADD COMMENTlink written 21 months ago by r0ntu20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 449 users visited in the last hour