Question: how to go from aligning forward when analyzing whole exome sequencing
0
gravatar for Learner
6 months ago by
Learner 160
Learner 160 wrote:

I have read so many post on this website but most of them are old , I used this post to build up my pipeline, however, it is very old now (What Is The Best Pipeline For Human Whole Exome Sequencing? ).

Is there anyone who could give me a better update on Genome Analysis Toolkit I cannot find the best way to do the steps 6 to 11 of that post and also gatk changed as mentioned below

https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.3.0/

genomics • 313 views
ADD COMMENTlink modified 6 months ago by manuel.belmadani1.1k • written 6 months ago by Learner 160
1

Did you check v4 best practices? What specifically is troubling you?

https://software.broadinstitute.org/gatk/best-practices/workflow

ADD REPLYlink written 6 months ago by Santosh Anand4.9k

Probably the "Exome" part, the Broad guides I think assume you have WGS.

ADD REPLYlink written 6 months ago by manuel.belmadani1.1k

@manuel.belmadani WGS is different than WES. but the process should be rather similar

ADD REPLYlink written 6 months ago by Learner 160

Yes that's why I was suggesting that using the Broad best practices might not be completely appropriate. I'd be concerned that the base/variant recalibration steps would differ for WES. It's been asked in their forum and there's some answer about specific steps but no comprehensive guide for WES afaik.

ADD REPLYlink written 6 months ago by manuel.belmadani1.1k

@manuel.belmadani look at this one, outdated https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_bqsr_BaseRecalibrator.php

ADD REPLYlink written 6 months ago by Learner 160

Yeah, I find it's difficult to adjust the individual steps if you're not following an entire guide. See my answer about using the ExAC pipeline instead.

ADD REPLYlink written 6 months ago by manuel.belmadani1.1k

@Santosh Anand I did check that, I am basically stuck on few steps

Identify target regions for realignment , Realign BAM to get better Indel calling , Call Indels ,
Call SNPs , View aligned reads in BAM/BAI
ADD REPLYlink written 6 months ago by Learner 160

That is an 8 year old post where the newest answer is 4+ years old. You're better off implementing GATK Best Practices. AFAIK, building a pipeline from scratch can be quite challenging for people lacking significant experience.

ADD REPLYlink written 6 months ago by RamRS23k

@RamRS so do you have any post or something that is new and I can follow ? there are many parameters which have an affect on the output so I would like to get the use of some people experiences rather than running around myself

ADD REPLYlink modified 6 months ago • written 6 months ago by Learner 160

Not really, no. Is using a cloud platform such as Seven Bridges or GATK Firecloud an option? That might be easier froma get-it-done perspective.

ADD REPLYlink written 6 months ago by RamRS23k

@RamRS no I cannot use cloud, If I could galaxy would be a good option to use, the problem is that I don't want to just click and I have read a lot but surprisingly not many documents are out there on how one can progress

ADD REPLYlink written 6 months ago by Learner 160
1
gravatar for manuel.belmadani
6 months ago by
Canada
manuel.belmadani1.1k wrote:

I would follow the methods used by ExAC, which processed over 60k exomes using the pipeline described in their manuscript supplements. It's probably the most reliable reference I can think of in terms of exome variant calling.

Paper: https://www.nature.com/articles/nature19057

Go to Supplementary Information, starting from "1 Data Generation". They provide all the steps they use including filters. It should get you most of the way there.

ADD COMMENTlink written 6 months ago by manuel.belmadani1.1k

@ this is also old, can I rely on their command ? for example they are using old GATK , look at the command java –jar GenomeAnalysisTK.jar \ but thanks for sharing, I m gonna read it carefully thanks. I like your answer already

ADD REPLYlink written 6 months ago by Learner 160

It should be fine I think. It's not like ExAC data is not good anymore, it's still widely used and standard. The same group came out with gnomAD more recently which extends on the work from ExAC but it's not published yet. There might be a preprint on biorxiv but I'm not sure if the pipeline changed at all.

ADD REPLYlink written 6 months ago by manuel.belmadani1.1k

Nice resource! It's great that's they're using GATK HC and not UG, but bwa mem is better than and this can replace bwa aln, right?

ADD REPLYlink written 6 months ago by RamRS23k

Most likely yes. I remember reading some benchmark where they recommended aln for shorter reads (~36bp) and mem for anythong > 100bp, I recall mem being more straightforward to use for some reason.

ADD REPLYlink written 6 months ago by manuel.belmadani1.1k

@manuel.belmadani I am trying to use their pipeline but GATK changed, now I cannot use any RealignerTargetCreator do you have any suggestion or steps that I should take ?

ADD REPLYlink written 6 months ago by Learner 160

IndelRealigner is not really necessary with GATK-HC >3.4, I think. HC preforms local realignment around indels anyway, so you should be fine. Do hold on until others provide their feedback as well, my GATK knowledge is quite dated.

ADD REPLYlink written 6 months ago by RamRS23k

That seems right. See this post.

Realigning reads using IndelRealigner or assembling reads using HaplotypeCaller allows us to call the insertion. That indel realignment has been a part of pre-processing workflows for seven years and will continue to be a part of workflows still dependent on locus-based callers is a testament to the improvements it brings. And if you feel apprehensive about omitting it from your HaplotypeCaller and MuTect2 workflows, we empathize. These changes are about improving efficiency in the face of incremental returns. If you find substantial changes, then I encourage you to share details with us.

ADD REPLYlink written 6 months ago by manuel.belmadani1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 547 users visited in the last hour