HISAT2 index build: what to do for SNPs with >1 alt alleles
0
0
Entering edit mode
5.1 years ago
msg ▴ 10

Hi, I'm trying to build hisat2 index with my own SNP vcf file. I noticed that for SNPs with multiple alternative alleles, hisat2_extract_snps_haplotypes_VCF.py only write the first alternative allele into the output .snp file. my code is hisat2_extract_snps_haplotypes_VCF.py --non-rs GRCh37.genome.fa test.vcf.gz test

I tried 2 ways in the input vcf:

22 40042284 22:40042284 A T,G . PASS AF=0.02106;MAF=0.02106;R2=0.92725 GT:DS:GP 0|1:1:0,1,0

and

22 44676852 22:44676852 T G . PASS AF=0.61037;MAF=0.38963;R2=0.99104 GT:DS:GP 1|0:1:0,1,0 22 44676852 22:44676852 T A . PASS AF=0.00574;MAF=0.00574;R2=0.72263 GT:DS:GP 1|0:1:0,1,0

the output is:

22:40042284.0 single 22 40042283 T

22:44676852 single 22 44676851 G

I wonder if I should change the name of snp into something like 22:44676852.0 22:44676852.1 to force hisat2_extract_snps_haplotypes_VCF.py output both alternative alleles? I'm worried that if I do so, something would go run when I run hisat2-build

thanks in advance

alignment SNP next-gen • 1.1k views
ADD COMMENT
2
Entering edit mode

I don't know about HISAT2, but I know valid vcf can only contains 1 position per line, your 2 input must invalid. I think your input vcf should like:

22 40042284 A T,G . PASS AF=0.02106;MAF=0.02106;R2=0.92725 GT:DS:GP 0|1:1:0,1,0

Your only need to present position 1 time for multiple SNP. See VCF fotmat doc.

ADD REPLY
0
Entering edit mode

thanks Matthew. my vcf contains the result of genotype array so I think that's why different alternative alleles are separated.

ADD REPLY

Login before adding your answer.

Traffic: 2417 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6