Question: How Can I Speed Up Gatk Unified Genotyper
1
gravatar for Alex Coventry
8.5 years ago by
Alex Coventry10 wrote:

I'm running the GATK on 500 samples to call variants in a few megabases of hg18. I am finding that it's going surprisingly slowly. For instance, I have UnifiedGenotyper running on some 1kb regions at the moment, and many have been running over 12 hours without completion. This could be because parts of the regions I'm targeting for caling were capture-targetted, and the pile up of illumina reads aligned to those regions can be very deep. So my next experiment is to try to mitigate the effect of these deeply covered regions by running GATK with a relatively low -dcov value, say around 50. If this could be expected to substantially affect its accuracy, I would be grateful to learn about it.

Here are the options I'm running GATK with, in case I'm doing something silly:

-T UnifiedGenotyper -glm BOTH -L $region \
-R .../human_b36_both.chr.fasta -o $outpath -I <bamfile> -I <bamfile> ...

Also, I understand there's a markov chain underlying the UG's calls. I suspect slow convergence might be the main factor. Is there an option to tell UG to punt on a site after a certain length of markov chain?

gatk • 3.6k views
ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 8.5 years ago by Alex Coventry10

are you putting all 500 .bams through UG at the same time?

ADD REPLYlink written 8.5 years ago by Russh1.2k

Yes. Is that too much?

ADD REPLYlink written 8.5 years ago by Alex Coventry10

I was going to recommend posting at getsatisfaction.com but it seems you already have. I thought your command line might be too long, or that you'd maxed out the memory, but that doesn't seem likely having seen your gsa post.

ADD REPLYlink written 8.5 years ago by Russh1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1463 users visited in the last hour