GATK Workflow for GATK in Cancer Samples
2
2
Entering edit mode
8.9 years ago

I am just starting to learn to use bioinformatics tools. My university has a limited and expensive bioinformatics team, so I'm mostly on my own except for big questions.

I am planning to use GATK to run 58 cancer control/normal pairs of Exome sequencing data (Illumina) from FASTQ or BAM file format, through the pipeline, with an output of a VCF & MAF format for analysis.

The current GATK pipeline is used for disease but not cancer, so I was wondering if anyone knew if there should be changes made for cancer. Here's the current pipeline starting with BAM files:

  • (Non-GATK) Picard Mark Duplicates or Samtools roundup
  • Indel Realignment (Realigner TargetCreator + Indel Realigner)
  • Base Quality Score Reacalibration (Base Recalibrator + PrintReads)
  • HaplotypeCaller <- I've been told this is for germline variants; what can I use for somatic variants?
  • VQSR (VariantRecalibrator and ApplyRecalibrator in SNP and INDEL mode)
  • Annotation using Oncotator (?)

I'd like some verification that this pipeline will output what I need to run my samples on MuTect, MutSig, or some other analysis program. I appreciate any advice.

Crossposted on Stack Exchange Biology.

GATK exome • 6.0k views
ADD COMMENT
2
Entering edit mode

The analysis can be done in a pretty good way using this link Link

ADD REPLY
6
Entering edit mode
8.8 years ago
vdauwera ★ 1.2k

We (GATK docs team) are working on some docs for the somatic variant calling use case. In a nutshell, you'll need to do an additional pre-processing step called co-cleaning where you perform indel realignment on the tumor and normal in a pair together, use ContEst to estimate cross-sample contamination, use MuTect to call variants (not HC, which is not able to call low-AF variants like MuTect), do some manual filtering and processing to eliminate artifacts (VQSR is not appropriate for somatic calls) and finally annotate with Oncotator.

We're happy to provide more details in the GATK support forum. The CGA homepage (http://www.broadinstitute.org/cancer/cga/Home) is also a good resource.

ADD COMMENT
1
Entering edit mode

"pre-processing step called co-cleaning where you perform indel realignment on the tumor and normal in a pair together,"

I am confused as since we need have separate intervals for both Tumor and sample then how we will feed them one for Co-Realignment. Do you merged them in one file ? I tried but cant give both files as input.

ADD REPLY
0
Entering edit mode

Since this has been posted, this is the only additional information I've found. Be sure to read the comments, as there's some good info there! http://gatkforums.broadinstitute.org/wdl/discussion/5963/tumor-normal-paired-exome-sequencing-pipeline

If there's an updated documentation, please post a link here.

ADD REPLY
0
Entering edit mode

Update to my answer: we now provide fully configured pipelines in a cloud platform called Terra that we built on top of Google cloud. Terra is freely accessible, with compute & storage billed directly by Google. You get a $300 google credit when you sign up for Terra so you can get a lot of work done without paying a cent.

The pipelines are fully set up in workspaces that include test data and cost+runtime estimates. For example you can check out the pipeline for somatic short variant calling (using Mutect2) here: https://app.terra.bio/#workspaces/help-gatk/Somatic-SNVs-Indels-GATK4

You can also bring your own tools & pipelines, btw, it's not restricted to GATK or Broad tools. There's a Terra showcase that presents a variety of preloaded analyses from various groups, if you want to check that out: https://app.terra.bio/#library/showcase

FYI this blog post describes how running pipelines on Terra works if you want to get a sense of that first: https://software.broadinstitute.org/gatk/blog?id=24139

And this post shows how to run individual GATK commands on cloud in jupyter notebooks, which we're using going forward to provide hands-on tutorials both in our workshops and for self-service learning: https://software.broadinstitute.org/gatk/blog?id=24175

ADD REPLY
1
Entering edit mode
8.4 years ago
r0ntu ▴ 50

For your annotation step, you might want to consider using CRAVAT. Here's a post explaining the web tool. The CHASM scores from CRAVAT in particular would probably be of interest to you! Also, CRAVAT will be releasing a new graphical interactive explorer tool in a couple of weeks and we will post details here when it's up.

ADD COMMENT

Login before adding your answer.

Traffic: 1498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6