Here is an example variant for genome NA20317. It comes from 1000 genomes data (file: ftp://ftp.ncbi.nih.gov///1000genomes/ftp/release/20110521/ALL.chr8.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz) and was obtained with tabix and vcf-subset script:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA20317
8 2122082 MERGED_DEL_2_47886 TGGAAGACAGTGGCAGGTCATCAGGCATTAGTTAGTTTCTCATAAGGAGCGTACAGCCCAGATCGCTCGCACGCACAGTTCACAATAGGGTTCGAGCTCCCATGAGAATCTAATGCCGCCCCTGATCTGACAGGAGGTGGAGCTCAGGCGGTCATGTGAGCAGTGGGGAGCAGCTGTAAATACAGGTGAAGCTTCGTTGGCTCACTTGCTGGACTGCCACTCACCTCCTGCTGTGTGTCTGGGTTCCTAACAGGCCACGGCGCGG T AC=2;AF=0.68;AFR_AF=0.62;AMR_AF=0.55;AN=2;ASN_AF=0.83;AVGPOST=0.9465;CIEND=-177,243;CIPOS=-187,234;END=2122346;ERATE=0.0296;EUR_AF=0.66;HOMLEN=2;HOMSEQ=GG;LDAF=0.6749;RSQ=0.9079;SVLEN=-264;SVTYPE=DEL;THETA=0.0005;VT=SV GT:DS:GL 1|1:2.000:-2.91,-1.56,-0.01
As I understand, the variant is a deletion of all nucleotides from position 2122083 to 2122346.
But, in that case, how should fields CIEND=-177,243
and CIPOS=-187,234
be interpreted?
In the header they are specified as:
##INFO=< ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
##INFO=< ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
Thus, does it mean that this variant is imprecise and its exact position and length is unknown? If so, why so precise data is present (ALT sequence, END, SVLEN)?