How Can I Speed Up Gatk Unified Genotyper
0
1
Entering edit mode
12.6 years ago

I'm running the GATK on 500 samples to call variants in a few megabases of hg18. I am finding that it's going surprisingly slowly. For instance, I have UnifiedGenotyper running on some 1kb regions at the moment, and many have been running over 12 hours without completion. This could be because parts of the regions I'm targeting for caling were capture-targetted, and the pile up of illumina reads aligned to those regions can be very deep. So my next experiment is to try to mitigate the effect of these deeply covered regions by running GATK with a relatively low -dcov value, say around 50. If this could be expected to substantially affect its accuracy, I would be grateful to learn about it.

Here are the options I'm running GATK with, in case I'm doing something silly:

-T UnifiedGenotyper -glm BOTH -L $region \
-R .../human_b36_both.chr.fasta -o $outpath -I <bamfile> -I <bamfile> ...

Also, I understand there's a markov chain underlying the UG's calls. I suspect slow convergence might be the main factor. Is there an option to tell UG to punt on a site after a certain length of markov chain?

gatk • 4.6k views
ADD COMMENT
0
Entering edit mode

are you putting all 500 .bams through UG at the same time?

ADD REPLY
0
Entering edit mode

Yes. Is that too much?

ADD REPLY
0
Entering edit mode

I was going to recommend posting at getsatisfaction.com but it seems you already have. I thought your command line might be too long, or that you'd maxed out the memory, but that doesn't seem likely having seen your gsa post.

ADD REPLY

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6