Question: GATK Workflow for GATK in Cancer Samples
2
gravatar for gaiusjaugustus
4.3 years ago by
United States
gaiusjaugustus150 wrote:

I am just starting to learn to use bioinformatics tools. My university has a limited and expensive bioinformatics team, so I'm mostly on my own except for big questions.

I am planning to use GATK to run 58 cancer control/normal pairs of Exome sequencing data (Illumina) from FASTQ or BAM file format, through the pipeline, with an output of a VCF & MAF format for analysis.

The current GATK pipeline is used for disease but not cancer, so I was wondering if anyone knew if there should be changes made for cancer. Here's the current pipeline starting with BAM files:

  • (Non-GATK) Picard Mark Duplicates or Samtools roundup
  • Indel Realignment (Realigner TargetCreator + Indel Realigner)
  • Base Quality Score Reacalibration (Base Recalibrator + PrintReads)
  • HaplotypeCaller <- I've been told this is for germline variants; what can I use for somatic variants?
  • VQSR (VariantRecalibrator and ApplyRecalibrator in SNP and INDEL mode)
  • Annotation using Oncotator (?)

I'd like some verification that this pipeline will output what I need to run my samples on MuTect, MutSig, or some other analysis program. I appreciate any advice.

Crossposted on Stack Exchange Biology.

gatk exome • 4.3k views
ADD COMMENTlink modified 3.9 years ago by r0ntu50 • written 4.3 years ago by gaiusjaugustus150
2

The analysis can be done in a pretty good way using this link Link

ADD REPLYlink written 2.0 years ago by Karma210
4
gravatar for vdauwera
4.3 years ago by
vdauwera960
Cambridge, MA
vdauwera960 wrote:

We (GATK docs team) are working on some docs for the somatic variant calling use case. In a nutshell, you'll need to do an additional pre-processing step called co-cleaning where you perform indel realignment on the tumor and normal in a pair together, use ContEst to estimate cross-sample contamination, use MuTect to call variants (not HC, which is not able to call low-AF variants like MuTect), do some manual filtering and processing to eliminate artifacts (VQSR is not appropriate for somatic calls) and finally annotate with Oncotator. 

We're happy to provide more details in the GATK support forum. The CGA homepage (http://www.broadinstitute.org/cancer/cga/Home) is also a good resource.

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by vdauwera960
1

"pre-processing step called co-cleaning where you perform indel realignment on the tumor and normal in a pair together,"

I am confused as since we need have separate intervals for both Tumor and sample then how we will feed them one for Co-Realignment. Do you merged them in one file ? I tried but cant give both files as input.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by always_learning980

Since this has been posted, this is the only additional information I've found. Be sure to read the comments, as there's some good info there! http://gatkforums.broadinstitute.org/wdl/discussion/5963/tumor-normal-paired-exome-sequencing-pipeline

If there's an updated documentation, please post a link here.

ADD REPLYlink written 3.5 years ago by gaiusjaugustus150

Update to my answer: we now provide fully configured pipelines in a cloud platform called Terra that we built on top of Google cloud. Terra is freely accessible, with compute & storage billed directly by Google. You get a $300 google credit when you sign up for Terra so you can get a lot of work done without paying a cent.

The pipelines are fully set up in workspaces that include test data and cost+runtime estimates. For example you can check out the pipeline for somatic short variant calling (using Mutect2) here: https://app.terra.bio/#workspaces/help-gatk/Somatic-SNVs-Indels-GATK4

You can also bring your own tools & pipelines, btw, it's not restricted to GATK or Broad tools. There's a Terra showcase that presents a variety of preloaded analyses from various groups, if you want to check that out: https://app.terra.bio/#library/showcase

FYI this blog post describes how running pipelines on Terra works if you want to get a sense of that first: https://software.broadinstitute.org/gatk/blog?id=24139

And this post shows how to run individual GATK commands on cloud in jupyter notebooks, which we're using going forward to provide hands-on tutorials both in our workshops and for self-service learning: https://software.broadinstitute.org/gatk/blog?id=24175

ADD REPLYlink written 11 weeks ago by vdauwera960
1
gravatar for r0ntu
3.9 years ago by
r0ntu50
United States/Baltimore/Johns Hopkins University
r0ntu50 wrote:

For your annotation step, you might want to consider using  CRAVAT. Here's a post explaining the web tool. The CHASM scores from CRAVAT in particular would probably be of interest to you!  Also, CRAVAT will be releasing a new graphical interactive explorer tool in a couple of weeks and we will post details here when it's up.  

ADD COMMENTlink written 3.9 years ago by r0ntu50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 902 users visited in the last hour