Question: HISAT2 index build: what to do for SNPs with >1 alt alleles
0
gravatar for msg
5 weeks ago by
msg0
msg0 wrote:

Hi, I'm trying to build hisat2 index with my own SNP vcf file. I noticed that for SNPs with multiple alternative alleles, hisat2_extract_snps_haplotypes_VCF.py only write the first alternative allele into the output .snp file. my code is hisat2_extract_snps_haplotypes_VCF.py --non-rs GRCh37.genome.fa test.vcf.gz test

I tried 2 ways in the input vcf:

22 40042284 22:40042284 A T,G . PASS AF=0.02106;MAF=0.02106;R2=0.92725 GT:DS:GP 0|1:1:0,1,0

and

22 44676852 22:44676852 T G . PASS AF=0.61037;MAF=0.38963;R2=0.99104 GT:DS:GP 1|0:1:0,1,0 22 44676852 22:44676852 T A . PASS AF=0.00574;MAF=0.00574;R2=0.72263 GT:DS:GP 1|0:1:0,1,0

the output is:

22:40042284.0 single 22 40042283 T

22:44676852 single 22 44676851 G

I wonder if I should change the name of snp into something like 22:44676852.0 22:44676852.1 to force hisat2_extract_snps_haplotypes_VCF.py output both alternative alleles? I'm worried that if I do so, something would go run when I run hisat2-build

thanks in advance

snp next-gen alignment • 135 views
ADD COMMENTlink written 5 weeks ago by msg0
2

I don't know about HISAT2, but I know valid vcf can only contains 1 position per line, your 2 input must invalid. I think your input vcf should like:

22 40042284 A T,G . PASS AF=0.02106;MAF=0.02106;R2=0.92725 GT:DS:GP 0|1:1:0,1,0

Your only need to present position 1 time for multiple SNP. See VCF fotmat doc.

ADD REPLYlink written 5 weeks ago by MatthewP80

thanks Matthew. my vcf contains the result of genotype array so I think that's why different alternative alleles are separated.

ADD REPLYlink written 5 weeks ago by msg0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1151 users visited in the last hour