I have a VCF file created by running GATK on read files against a reference genome. The variants in the VCF file have 'locations', these are the locations on the reference genome. Sample lines include
NC_002516.2 92915 . T A 1941.76 PASS AC=2;AF... GT:AD:DP:GQ:PL 1/1:0,80:81:99:1975,240,0 NC_002516.2 192617 . GA G 2562.66 PASS AC=2;AF=... GT:AD:DP:GQ:PL 1/1:0,64:64:99:2605,193,0
I also have a consensus sequence created by vcftools. Which starts off as -
What I need though is the variant location on the consensus sequence. So if, from the VCF file, '92915' is the first variant, then this is the location on the reference as well as on the consensus. However, subsequently there are indels. Which will shift the location on the consensus forward and backward. So I need a tool to calculate the variant location on the consensus.
(And then I will need to get annotation data for that region.) Any idea how this can be done please- getting variant consensus locations?
Actually VCFtools is also giving an error, I need to find another utility to create the consensus sequence.... Much appreciated