Question: How can I generate a .vcf faster with GATK - HaplotypleCaller?
0
gravatar for valopes
18 months ago by
valopes30
valopes30 wrote:

Hi all,

I am new in snp calling things and .vcf genenating as well. I have 37 soybean genotypes that I've already filtered by quality, removed dup etc...

At first I've ran the RealignerTargetCreator for all the samples, using this command:

java -Xms4g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator \
    -R /indice/Gmax_275_v2.0.fa -I Sample1_qfilter_sorted_rmdup.bam \
    -I Sample2_qfilter_sorted_rmdup.bam -I Sample3_qfilter_sorted_rmdup.bam \
    -I Sample4_qfilter_sorted_rmdup.bam .........  -o realignment_targets.list

Then later I've generated the big bam file:

java -jar GenomeAnalysisTK.jar -T IndelRealigner \
    -R /indice/Gmax_275_v2.0.fa -I Sample1_qfilter_sorted_rmdup.bam \
    -I Sample2_qfilter_sorted_rmdup.bam -I Sample3_qfilter_sorted_rmdup.bam \
    -I 4_qfilter_sorted_rmdup.bam ...... targetIntervals realignment_targets.list -o realigned_reads.bam

Both processes above took me like almost 10 days. Now I am trying to run HaplotypeCaller and generate the raw_variants.vcf file, using this command:

java -Xmx10g -jar GenomeAnalysisTK.jar -T HaplotypeCaller \
    -R /indice/Gmax_275_v2.0.fa -I realigned_reads.bam -o raw_variants.vcf

but it says that it will take 73 weeks. So I need to figure out how to make it faster.

I saw that I could kind of divide it and run many HaplotypeCaller processes in parallel but I have no idea of how it would be the command for that.

Could you help me with that, please?

Thanks in advance!

snp • 1.2k views
ADD COMMENTlink modified 18 months ago by RamRS22k • written 18 months ago by valopes30
1

HaploTypeCaller supports multiple types of parallelization, take a look at https://gatkforums.broadinstitute.org/gatk/discussion/1975/how-can-i-use-parallelism-to-make-gatk-tools-run-faster and https://software.broadinstitute.org/gatk/documentation/article.php?id=1988

ADD REPLYlink written 18 months ago by RamRS22k

So now I am using -nct 8. Let's see how it goes. Thank you.

ADD REPLYlink written 18 months ago by valopes30

Hi, have you managed to use -nct option with HaplotypeCaller? I am getting an error "n is not a recognized option"

ADD REPLYlink written 9 months ago by alslonik100
1

GATK4 is being released this month and is if I remember correctly much faster.

ADD REPLYlink written 18 months ago by WouterDeCoster40k
1

GATK 4 has been officially released last evening already! :-)

ADD REPLYlink written 18 months ago by Nandini820
1

I was not wrong, but also not very accurate ;-)

ADD REPLYlink written 18 months ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1028 users visited in the last hour