Hello, I want to align RNA-seq data to the zebrafish reference genome with SNPs using hisat2. I followed the "How to" turtorial on the hisat2 homepage, but I cannot build an index. Maybe the VCF file containing the SNPs is not compatible with the reference genome. Unfortunately, there are no downloads for zebrafish on the hisat2 homepage.
What I did:
1) download, unzip and rename reference genome
wget ftp://ftp.ensembl.org/pub/release-102/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
gzip -d Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
mv Danio_rerio.GRCz11.dna.primary_assembly.fa genome.fa
2) download and unzip SNP file
wget ftp://ftp.ensembl.org/pub/release-102/variation/vcf/danio_rerio/danio_rerio.vcf.gz
gzip -d danio_rerio.vcf.gz
3) extract SNP to hisat2 format
hisat2_extract_snps_haplotypes_VCF.py genome.fa danio_rerio.vcf.gz genome
-> there are a lot of errors like: "Error: the reference genome you provided seems to be incompatible with the VCF file at 654 of chromosome KZ116062.1 where C is in the reference genome while G is in the VCF file", but a file is generated
4) build HFM index
hisat2-build -p 16 --snp genome.snp --haplotype genome.haplotype genome.fa genome_snp
-> there are a lot of warnings and the program stops: "Warning: single type should have a different base than T (rs505251572) at 58622972 on 3
Time to read SNPs and splice sites: 00:00:16
Killed"
I also tried the SNP file from NCBI (https://ftp.ncbi.nlm.nih.gov/snp/organisms/archive/zebrafish_7955/VCF/00-All.vcf.gz), but that didn't work at all.
Do you know where I can download the latest zebrafish reference genome with the corresponding SNP file or how I can build the index with the SNP file?
Thank you very much!