Hi everyone, I'm new to bioinformatics and I'm very stuck at this moment.
I'm trying to run migrate-n and one of the requirements is to have a reference fasta file. I was reading that the reference file could be obtained by consensus my VCF file with the reference genome.
First, I turned my .vcf to .vcf.gz using bgzip like this:
bgzip -c populations.snps.vcf > snps.vcf.gz
but this error appears when I try this script to create the consensus:
cat GCF_009762305.2_mZalCal1.pri.v2_genomic.fna | bcftools consensus migrate/snps.vcf.gz > migrate/consensus.fa
Failed to open migrate/snps.vcf.gz: could not load index
When trying to create an index with tabix another error appears
tabix -p vcf snps.vcf.gz
[E::hts_idx_push] Unsorted positions on sequence #1: 144478 followed by 144273
tbx_index_build failed: snps.vcf.gz
And last, I also tried to sort my vcf but I neither could. I have the idea that maybe my vcf file is not working properly because I'm not being able to sort it vertically.
This is how my VCF looks like:
##fileformat=VCFv4.2
##fileDate=20210614
##source="Stacks v2.53"
##INFO=<ID=AD,Number=R,Type=Integer,Description="Total Depth for Each Allele">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allele Depth">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=loc_strand,Number=1,Type=Character,Description="Genomic strand the correspondin
g Stacks locus aligns on">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Q06B Q03 Q02 Q05 RN7 PQ03 RN4 PV22 PV16 PV18 MLO09 MLO11 CO16 CO20 CO19 MLO01 MLO17 MLO23 MLE10 MLE11 MLE09 MLE07 MLE14 MLE12 MLE13
BEC18 RG01 TRA07 TRA08 POL11 BEC12 IB05
NC_045595.1 49821 16:11:+ T C . PASS NS=30;AF=0.167 GT:DP:AD:GQ:GL 1/1:25:0,25:40:-100.66,-8.34,-0.00 1/1:12:0,12:40:-48.01,-4.43,-0.00 1/1:23:0,23:40:-92.56,-7.74,-0.00 1/1:18:0,18:40:-72.31,-6.24,-0.00 1/1:50:0,50:40:-201.91,-15.87,-0.00 0/0:12:12,0:40:-0.00,-5.02,-49.18 0/0:19:19,0:40:-0.00,-7.12,-77.53 0/0:21:21,0:40:-0.00,-7.73,-85.63 0/0:21:21,0:40:-0.00,-7.73,-85.63
0/0:8:8,0:40:-0.00,-3.81,-32.98 0/0:2:2,0:26:-0.00,-2.01,-8.69 0/0:2:2,0:26:-0.00,-2.01,-8.69 0/0:25:25,0:40:-0.00,-8.93,-101.83 0/0:14:14,0:40:-0.00,-5.62,-57.28
0/0:18:18,0:40:-0.00,-6.82,-73.48 0/0:36:36,0:40:-0.00,-12.24,-146.38 0/0:14:14,0:40:-0.00,-5.62,-57.28 0/0:38:38,0:40:-0.00,-12.84,-154.48 0/0:31:31,0:40:-0.00,-10.74,-126.13 0/0:11:11,0:40:-0.00,-4.72,-45.13 0/0:6:6,0:39:-0.00,-3.21,-24.89 0/0:9:9,0:40:-0.00,-4.11,-37.03 0/0:25:25,0:40:-0.00,-8.93,-101.83 0/0:44:44,0:40:-0.00,-14.65,-178.78 0/0:39:39,0:40:-0.00,-13.14,-158.53 0/0:6:6,0:39:-0.00,-3.21,-24.89 0/0:10:10,0:40:-0.00,-4.41,-41.08 0/0:5:5,0:35:-0.00,-2.91,-20.84 ./.
0/0:4:4,0:32:-0.00,-2.61,-16.79 ./. 0/0:5:5,0:35:-0.00,-2.91,-20.84
NC_045595.1 53541 19:8:- G C . PASS NS=31;AF=0.161 GT:DP:AD:GQ:GL 0/0:16:16,0:40:-0.00,-5.25,-68.71 0/0:59:59,0:40:-0.00,-18.19,-250.15 0/0:19:19,0:40:-0.00,-6.15,-81.37 0/0:30:30,0:40:-0.00,-9.46,-127.78 0/0:61:61,0:40:0.00,-18.79,-258.59 0/0:9:9,0:38:-0.00,-3.14,-39.17 0/0:30:30,0:40:-0.00,-9.46,-127.78 0/0:16:16,0:40:-0.00,-5.25,-68.71 0/0:22:22,0:40:-0.00,-7.05,-94.03 0/0:29:29,0:40:-0.00,-9.16,-123.56 0/1:26:21,5:40:-12.84,-0.00,-81.55
Hope you can help me because this is driving my crazy hahaha.
Diego