I am new in snp calling things and .vcf genenating as well. I have 37 soybean genotypes that I've already filtered by quality, removed dup etc...
At first I've ran the RealignerTargetCreator for all the samples, using this command:
java -Xms4g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator \ -R /indice/Gmax_275_v2.0.fa -I Sample1_qfilter_sorted_rmdup.bam \ -I Sample2_qfilter_sorted_rmdup.bam -I Sample3_qfilter_sorted_rmdup.bam \ -I Sample4_qfilter_sorted_rmdup.bam ......... -o realignment_targets.list
Then later I've generated the big bam file:
java -jar GenomeAnalysisTK.jar -T IndelRealigner \ -R /indice/Gmax_275_v2.0.fa -I Sample1_qfilter_sorted_rmdup.bam \ -I Sample2_qfilter_sorted_rmdup.bam -I Sample3_qfilter_sorted_rmdup.bam \ -I 4_qfilter_sorted_rmdup.bam ...... targetIntervals realignment_targets.list -o realigned_reads.bam
Both processes above took me like almost 10 days. Now I am trying to run HaplotypeCaller and generate the raw_variants.vcf file, using this command:
java -Xmx10g -jar GenomeAnalysisTK.jar -T HaplotypeCaller \ -R /indice/Gmax_275_v2.0.fa -I realigned_reads.bam -o raw_variants.vcf
but it says that it will take 73 weeks. So I need to figure out how to make it faster.
I saw that I could kind of divide it and run many HaplotypeCaller processes in parallel but I have no idea of how it would be the command for that.
Could you help me with that, please?
Thanks in advance!