I am using FastaAlternateReferenceMaker to insert all SNPs from a VCF file into a reference to create a new fasta file of 41 bps (20 bps on flanking side). My end goal is to get the fasta sequence in which reference base will be replaced with the SNP in the VCF file. For this purpose, I tried the FastaAlternateReferenceMaker tool and used the following command:
java -jar GenomeAnalysisTK.jar \ -T FastaAlternateReferenceMaker \ -R hg38.fa \ -o myalt.fasta \ -L my.intervals \ -V myinput.vcf
The command run without error, however, the fasta sequence have different lengths. Although, my interval file has exactly the same length of 40 bps (in interval file it is 42 bps).
chr1 13261 13302 + 1_1 chr1 13282 13323 + 2_1 chr1 13398 13439 + 3_1 chr1 13398 13439 + 4_1
I also tried with the following command:
java -Xmx4g -cp lib/SVToolkit.jar:/lib/gatk/GenomeAnalysisTK.jar org.broadinstitute.sv.apps.GenerateAltAlleleFasta -I myinput.vcf -O myalt.fasta -R hg38.fa -flankLength 20
It gave me exactly 41 bps sequence fasta file. But when I cross-checked with vcf file, it has alt allele at a different position (See below). According to vcf file, alt allele must be present at chr1:13281 with G allele but in fasta file, the alt allele is located at chr1:13282 position.
java -Xmx4g -cp lib/SVToolkit.jar:/lib/gatk/GenomeAnalysisTK.jar org.broadinstitute.sv.apps.GenerateAltAlleleFasta -I myinput.vcf -O myalt.fasta -R hg38.fa -flankLength 20 >1_1 L:chr1:13262-13281:1-20|R:chr1:13283-13302:22-41|LENGTH:41 CTCCTGGACCAGTGATACACGCGGCACCCTGTCCTGGACAC
My expected output should be:
>1_1 L:chr1:13261-13280:1-20|R:chr1:13282-13301:22-41|LENGTH:41 GCTCCTGGACCAGTGATACAGGCGGCACCCTGTCCTGGACA
What am I doing wrong?