how to go from aligning forward when analyzing whole exome sequencing
I have read so many post on this website but most of them are old , I used this post to build up my pipeline, however, it is very old now (What Is The Best Pipeline For Human Whole Exome Sequencing? ).

Is there anyone who could give me a better update on Genome Analysis Toolkit I cannot find the best way to do the steps 6 to 11 of that post and also gatk changed as mentioned below

Did you check v4 best practices? What specifically is troubling you?

Probably the "Exome" part, the Broad guides I think assume you have WGS.

@manuel.belmadani WGS is different than WES. but the process should be rather similar

Yes that's why I was suggesting that using the Broad best practices might not be completely appropriate. I'd be concerned that the base/variant recalibration steps would differ for WES. It's been asked in their forum and there's some answer about specific steps but no comprehensive guide for WES afaik.

Yeah, I find it's difficult to adjust the individual steps if you're not following an entire guide. See my answer about using the ExAC pipeline instead.

@Santosh Anand I did check that, I am basically stuck on few steps

Identify target regions for realignment , Realign BAM to get better Indel calling , Call Indels ,
Call SNPs , View aligned reads in BAM/BAI

That is an 8 year old post where the newest answer is 4+ years old. You're better off implementing GATK Best Practices. AFAIK, building a pipeline from scratch can be quite challenging for people lacking significant experience.

@RamRS so do you have any post or something that is new and I can follow ? there are many parameters which have an affect on the output so I would like to get the use of some people experiences rather than running around myself

Not really, no. Is using a cloud platform such as Seven Bridges or GATK Firecloud an option? That might be easier froma get-it-done perspective.

@RamRS no I cannot use cloud, If I could galaxy would be a good option to use, the problem is that I don't want to just click and I have read a lot but surprisingly not many documents are out there on how one can progress

I would follow the methods used by ExAC, which processed over 60k exomes using the pipeline described in their manuscript supplements. It's probably the most reliable reference I can think of in terms of exome variant calling.

Go to Supplementary Information, starting from "1 Data Generation". They provide all the steps they use including filters. It should get you most of the way there.

@ this is also old, can I rely on their command ? for example they are using old GATK , look at the command java –jar GenomeAnalysisTK.jar \ but thanks for sharing, I m gonna read it carefully thanks. I like your answer already

It should be fine I think. It's not like ExAC data is not good anymore, it's still widely used and standard. The same group came out with gnomAD more recently which extends on the work from ExAC but it's not published yet. There might be a preprint on biorxiv but I'm not sure if the pipeline changed at all.

Nice resource! It's great that's they're using GATK HC and not UG, but bwa mem is better than and this can replace bwa aln, right?

Most likely yes. I remember reading some benchmark where they recommended aln for shorter reads (~36bp) and mem for anythong > 100bp, I recall mem being more straightforward to use for some reason.

@manuel.belmadani I am trying to use their pipeline but GATK changed, now I cannot use any RealignerTargetCreator do you have any suggestion or steps that I should take ?

IndelRealigner is not really necessary with GATK-HC >3.4, I think. HC preforms local realignment around indels anyway, so you should be fine. Do hold on until others provide their feedback as well, my GATK knowledge is quite dated.

That seems right. See this post.

Realigning reads using IndelRealigner or assembling reads using HaplotypeCaller allows us to call the insertion. That indel realignment has been a part of pre-processing workflows for seven years and will continue to be a part of workflows still dependent on locus-based callers is a testament to the improvements it brings. And if you feel apprehensive about omitting it from your HaplotypeCaller and MuTect2 workflows, we empathize. These changes are about improving efficiency in the face of incremental returns. If you find substantial changes, then I encourage you to share details with us.