GATK 4 and Spark multithreading
Entering edit mode
16 months ago
Vic ▴ 50

I would like to how to use Spark within GATK for multi-threading analysis. Unfortunately, the Broad Institute website for its cluster-Spark tutorial documentation is still in progress. I am using HaplotypeCaller which has been working fine but now I have some pooled seq samples and they take much longer so would like to spread the workload. This is an example of my usage:

gatk HaplotypeCaller -I my_pooled_sample.bam -I another_pooled_sample.bam -L a_chromosome -R ref_genome.fna -O my_out_file.g.vcf -ploidy 10 -- --spark-master local[2]

I used the above sparks command from this example. But it didn't work. I checked the help info and got this:

>     gatk forwards commands to GATK and adds some sugar for submitting spark jobs
>      --spark-runner <target>    controls how spark tools are run
>          valid targets are:
>          LOCAL:      run using the in-memory spark runner
>          SPARK:      run using spark-submit on an existing cluster
>                      --spark-master must be specified
>                      --spark-submit-command may be specified to control the Spark submit command
>                      arguments to spark-submit may optionally be specified after --
>          GCS:        run using Google cloud dataproc
>                      commands after the -- will be passed to dataproc
>                      --cluster <your-cluster> must be specified after the --
>                      spark properties and some common spark-submit parameters will be translated
>                      to dataproc equivalents

I then tried using:

--spark-runner local[2]

Which also didn't work. I would appreciate some guidance. Many thanks.

multithreading haplotypecaller sparks gatk • 1.1k views
Entering edit mode
Entering edit mode

I am sorry, I didn't realise that wasn't allowed, I have deleted the other post.


Login before adding your answer.

Traffic: 1518 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6