I am using FastaAlternateReferenceMaker to insert all SNPs from a VCF file into a reference to create a new fasta file of 41 bps (20 bps on flanking side). My end goal is to get the fasta sequence in which reference base will be replaced with the SNP in the VCF file. For this purpose, I tried the FastaAlternateReferenceMaker tool and used the following command:
java -jar GenomeAnalysisTK.jar \
-T FastaAlternateReferenceMaker \
-R hg38.fa \
-o myalt.fasta \
-L my.intervals \
-V myinput.vcf
The command run without error, however, the fasta sequence have different lengths. Although, my interval file has exactly the same length of 40 bps (in interval file it is 42 bps).
chr1 13261 13302 + 1_1
chr1 13282 13323 + 2_1
chr1 13398 13439 + 3_1
chr1 13398 13439 + 4_1
I also tried with the following command:
java -Xmx4g -cp lib/SVToolkit.jar:/lib/gatk/GenomeAnalysisTK.jar org.broadinstitute.sv.apps.GenerateAltAlleleFasta -I myinput.vcf -O myalt.fasta -R hg38.fa -flankLength 20
It gave me exactly 41 bps sequence fasta file. But when I cross-checked with vcf file, it has alt allele at a different position (See below). According to vcf file, alt allele must be present at chr1:13281 with G allele but in fasta file, the alt allele is located at chr1:13282 position.
java -Xmx4g -cp lib/SVToolkit.jar:/lib/gatk/GenomeAnalysisTK.jar org.broadinstitute.sv.apps.GenerateAltAlleleFasta -I myinput.vcf -O myalt.fasta -R hg38.fa -flankLength 20
>1_1 L:chr1:13262-13281:1-20|R:chr1:13283-13302:22-41|LENGTH:41
CTCCTGGACCAGTGATACACGCGGCACCCTGTCCTGGACAC
.............
My expected output should be:
>1_1 L:chr1:13261-13280:1-20|R:chr1:13282-13301:22-41|LENGTH:41
GCTCCTGGACCAGTGATACAGGCGGCACCCTGTCCTGGACA
............
What am I doing wrong?
Thanks,
Zaib.
Seems like the simple thing to try is to slide your coordinates over by one. At first glance, it looks like a discrepancy between 0-based coordinates and 1-based coordinates.
It is because of 0-based coordinates and 1-based coordinates. How to solve this?
Have you seen this? Cheat Sheet For One-Based Vs Zero-Based Coordinate Systems
I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below: