Question: How can I generate a .vcf faster with GATK - HaplotypleCaller?
0
gravatar for valopes
14 months ago by
valopes30
valopes30 wrote:

Hi all,

I am new in snp calling things and .vcf genenating as well. I have 37 soybean genotypes that I've already filtered by quality, removed dup etc...

At first I've ran the RealignerTargetCreator for all the samples, using this command:

java -Xms4g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator \
    -R /indice/Gmax_275_v2.0.fa -I Sample1_qfilter_sorted_rmdup.bam \
    -I Sample2_qfilter_sorted_rmdup.bam -I Sample3_qfilter_sorted_rmdup.bam \
    -I Sample4_qfilter_sorted_rmdup.bam .........  -o realignment_targets.list

Then later I've generated the big bam file:

java -jar GenomeAnalysisTK.jar -T IndelRealigner \
    -R /indice/Gmax_275_v2.0.fa -I Sample1_qfilter_sorted_rmdup.bam \
    -I Sample2_qfilter_sorted_rmdup.bam -I Sample3_qfilter_sorted_rmdup.bam \
    -I 4_qfilter_sorted_rmdup.bam ...... targetIntervals realignment_targets.list -o realigned_reads.bam

Both processes above took me like almost 10 days. Now I am trying to run HaplotypeCaller and generate the raw_variants.vcf file, using this command:

java -Xmx10g -jar GenomeAnalysisTK.jar -T HaplotypeCaller \
    -R /indice/Gmax_275_v2.0.fa -I realigned_reads.bam -o raw_variants.vcf

but it says that it will take 73 weeks. So I need to figure out how to make it faster.

I saw that I could kind of divide it and run many HaplotypeCaller processes in parallel but I have no idea of how it would be the command for that.

Could you help me with that, please?

Thanks in advance!

snp • 960 views
ADD COMMENTlink modified 14 months ago by RamRS20k • written 14 months ago by valopes30
1

HaploTypeCaller supports multiple types of parallelization, take a look at https://gatkforums.broadinstitute.org/gatk/discussion/1975/how-can-i-use-parallelism-to-make-gatk-tools-run-faster and https://software.broadinstitute.org/gatk/documentation/article.php?id=1988

ADD REPLYlink written 14 months ago by RamRS20k

So now I am using -nct 8. Let's see how it goes. Thank you.

ADD REPLYlink written 14 months ago by valopes30

Hi, have you managed to use -nct option with HaplotypeCaller? I am getting an error "n is not a recognized option"

ADD REPLYlink written 5 months ago by alslonik70
1

GATK4 is being released this month and is if I remember correctly much faster.

ADD REPLYlink written 14 months ago by WouterDeCoster37k
1

GATK 4 has been officially released last evening already! :-)

ADD REPLYlink written 14 months ago by Nandini760
1

I was not wrong, but also not very accurate ;-)

ADD REPLYlink written 14 months ago by WouterDeCoster37k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 829 users visited in the last hour