As I continue to add steps to my SNP/Indel discovery workflow, the latest recommendation is local realignment around Indels using GATK following the initial alignment step. I have just commenced the step which generates the target intervals for realignment (RealignerTargetCreator) and it looks like it will take an hour to complete, with the realignment still required after that. My test data set is a single sample of approx 5 million paired end 100bp reads.
For an upcoming project, my plan is to run 150 similarly sized samples. Therefore the addition of such time-consuming steps will have a major impact on timelines. Can anyone with experience in this area comment on the time required for Indel realignment vs the benefits received? Is it worth it?