What is the -nct alternative in GATK 4.0.0.0?
1
0
Entering edit mode
6.0 years ago

Hi all,

I am using the GATK 4.0.4.0 for HaplotypeCaller. and i want to run 50 samples in parallel to generate g.vcf. But I can not find the -nct option that available in the old version of GATK, for the GATK 4.0.4.0 version. did something replace -nct in the GATK version 4.0.4.0?

What is the best idea?

This question was previously asked by prasundutta, but I did not get a result and did not find anything (https://gatkforums.broadinstitute.org/gatk/discussion/11304/nct-not-present-in-gatk-4-0-0-0).

Best Regard

Mostafa

SNP • 4.0k views
ADD COMMENT
0
Entering edit mode

I'm having the same problem as reported here. Is the current "solution" still that it's not feasible to parallelize HaplotypeCaller in GATK 4?

And, if there is a way to do it, how?

I've read about Spark but I still don't understand what it is or how to use it.

ADD REPLY
0
Entering edit mode

Please use Add comment and not the answer box for comments.

ADD REPLY
0
Entering edit mode
6.0 years ago
Dave Carlson ★ 1.7k

As far as I know, there is no "-nct" equivalent in GATK 4. For HaplotypeCaller, even the Spark implementation is not yet recommended for use.

ADD COMMENT
0
Entering edit mode

Hi Carlson,

many thanks for your reply,

if Spark does not work then why recommend it?

if we do not specify in the command line the number of CPUs for the analysis of 50 samples in parallel, does our analysis make a mistake or not only in the long term?

ADD REPLY
0
Entering edit mode

does our analysis make a mistake

What kind of "mistake" are you thinking about? Changing/disabling parallelization rarely ever causes any significant error.

ADD REPLY
0
Entering edit mode

Yes, I mean the runtime error. In other words, because without specifying the number of CPUs, the runtime can be prolonged and our run will be encounters an error.

ADD REPLY
0
Entering edit mode

Runtimes can be prolonged, yes, but if there's going to be an error, it will happen no matter how many threads run the program. The choice of multithreading alone, when properly implemented, cannot cause an error. In your cause, not using multithreading cannot cause an error unless it's something like you've run out of resources.

ADD REPLY
0
Entering edit mode

Here is what the GATK developers say about using the HaplotypeCaller Spark implementation:

This tool DOES NOT match the output of HaplotypeCaller. * * It is still under development and should not be used for production work. * * For evaluation only. * * Use the non-spark HaplotypeCaller if you care about the results.

Based upon my reading of the GATK documentation and periodic perusal of the message boards and github page, I don't think there is a great deal that can be done currently to reliably parallelize the GATK 4 implementation of HaplotypeCaller.

Edit: If I'm wrong about that, I would very much like to know, since it would likely save me a bunch of time and effort.

ADD REPLY
0
Entering edit mode

Sparks implementation

It's "Spark", no s :-)

ADD REPLY
0
Entering edit mode

was corrected.......

ADD REPLY
0
Entering edit mode

Fixed, thanks. :)

ADD REPLY

Login before adding your answer.

Traffic: 2636 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6