Help understanding CHROM and POS fields in VCF
3.2 years ago
bdolin ▴ 90

Can someone explain how to relate CHROM and POS in VCF to the corresponding position in the referenced dbSNP?

For instance, in the following example:

#CHROM POS     ID        REF ALT
20     14370   rs6054257 G      A


It looks as though the VCF is referring to a variant on Chr20 at position 14370, but the referenced dbSNP has HGVS NC_000020.10:g.66370G>A.

If the VCF record didn't include the dbSNP id, how would one convert this into an HGVS expression?

3.2 years ago

Your interpretation about chrom and pos is correct. Keep in mind that this is the chromosome coordinate, and that HGVS is based on genes/transcripts. For annotation you can use VEP or Snpeff, which will give you HGVS notation.

Thank you for your answer. If VCF Pos=14370, how can it be that corresponding HGVS has position 66370?

3.2 years ago

which reference sequence did you use for alignment and variant calling? I guess something went wrong here. Because rs6054257 is located on chromosome 20 at position 66370 if you use hg19 or at position 85729G for hg38.

At position 14370 are no variants known as this region could not be sequenced successfully until now. There are just Ns.

fin swimmer

Here is where I took the example from: http://www.internationalgenome.org/wiki/Analysis/vcf4.0/

I believe that's hg18.

That was my thought either after sending my answer. In the linked example there is a header

##reference=1000GenomesPilot-NCBI36

which is hg18.

fin swimmer