Question: Passing VCF file directly into ```HiSAT2-build --snp --haplotype```
0
gravatar for rightmirem
6 months ago by
rightmirem40
rightmirem40 wrote:

We've been working with HiSAT2and HiSAT2-build for mapping NGS data to a reference.

In the next step, we wish to create a new indexed reference from VCF data we've generated.

My PI says to use...

hisat2-build --snp ./MySNPs.vcf.gz  --haplotype <something> Grch38.fa OutfilesIndexed

... to generate ...

OutfilesIndexed.1.ht2
OutfilesIndexed.2.ht2
etc.

The question is, based on what I see in the manual, I'm not sure I can use VCF format directly ...

--snp
Provide a list of SNPs (in the HISAT2's own format) as follows (five columns).

SNP ID <tab> snp type (single, deletion, or insertion) <tab> chromosome name <tab> zero-offset based genomic position of a SNP <tab> alternative base (single), the length of SNP (deletion), or insertion sequence (insertion)

For example, rs58784443 single 13 18447947 T

Use hisat2_extract_snps_haplotypes_UCSC.py

...although it DOES create files when I pipe a VCF in directly...but I don't know if they are reliable.

hisat2-build --snp ./MySNPs.vcf.gz   Grch38.fa  IDXXX

-rw-r--r-- 1 user grpname      8192 Nov 28 11:23 IDXXX.1.ht2
-rw-r--r-- 1 user grpname         0 Nov 28 11:23 IDXXX.2.ht2
-rw-r--r-- 1 user grpname     11294 Nov 28 11:23 IDXXX.3.ht2
-rw-r--r-- 1 user grpname 736462267 Nov 28 11:23 IDXXX.4.ht2
-rw-r--r-- 1 user grpname         0 Nov 28 11:23 IDXXX.7.ht2
-rw-r--r-- 1 user grpname         0 Nov 28 11:23 IDXXX.8.ht2

My understanding from here, is that I should use hisat2_extract_snps_haplotypes_VCF.py to convert the VCF to the HiSAT2 format above. But when I run the VCF file through it I get nothing...

hisat2_extract_snps_haplotypes_VCF.py Grch38.fa  ./MySNPs.vcf.gz  IDXXX

-rw-r--r-- 1 user grpname          0 Nov 28 14:53 IDXXX.haplotype
-rw-r--r-- 1 user grpname          0 Nov 28 14:53 IDXXX.snp
hisat2 snp alignment next-gen • 406 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by rightmirem40
1
gravatar for rightmirem
6 months ago by
rightmirem40
rightmirem40 wrote:

So, found my own answer. I'll leave this here in case the question comes up again.

The --snpswitch must be used (only) with the hisat2 format (as described above). It WILL actually run and produce output if you feed in a VCF file directly - but my understanding is this is purely accidental :D

The hisat2_extract_snps_haplotypes_VCF.py is supposed to convert a VCF file to this format, but assumes you're using annotated data (known SNPs). To use unknown SNPs you add the switch --non-rs ... which gave me output.

I'm really not sure how to check that the output was not garbage however :D

I also don't have a confirmation about what the --snpand --haplotype switches are REALLLLY doing. I assume they are making a Hierarchical Graph FM index (HGFM) meaning the SNPs will be aligned against as well as the REF. But I haven't really been able to get confirmation of that.

I may post that as a second question.

ADD COMMENTlink written 6 months ago by rightmirem40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1574 users visited in the last hour