Question: HISAT2 index build: what to do for SNPs with >1 alt alleles
Hi, I'm trying to build hisat2 index with my own SNP vcf file. I noticed that for SNPs with multiple alternative alleles, only write the first alternative allele into the output .snp file. my code is --non-rs GRCh37.genome.fa test.vcf.gz test

I tried 2 ways in the input vcf:

22 40042284 22:40042284 A T,G . PASS AF=0.02106;MAF=0.02106;R2=0.92725 GT:DS:GP 0|1:1:0,1,0


22 44676852 22:44676852 T G . PASS AF=0.61037;MAF=0.38963;R2=0.99104 GT:DS:GP 1|0:1:0,1,0 22 44676852 22:44676852 T A . PASS AF=0.00574;MAF=0.00574;R2=0.72263 GT:DS:GP 1|0:1:0,1,0

the output is:

22:40042284.0 single 22 40042283 T

22:44676852 single 22 44676851 G

I wonder if I should change the name of snp into something like 22:44676852.0 22:44676852.1 to force output both alternative alleles? I'm worried that if I do so, something would go run when I run hisat2-build

thanks in advance

I don't know about HISAT2, but I know valid vcf can only contains 1 position per line, your 2 input must invalid. I think your input vcf should like:

22 40042284 A T,G . PASS AF=0.02106;MAF=0.02106;R2=0.92725 GT:DS:GP 0|1:1:0,1,0

Your only need to present position 1 time for multiple SNP. See VCF fotmat doc.

thanks Matthew. my vcf contains the result of genotype array so I think that's why different alternative alleles are separated.

