When I go to UCSC table browser and get the BED file for NM_000314.6 (gene = "PTEN"), I find out that the first exon starts at position 89623194 in GRCh37/hg19 (!). Thus the exon starts at 89623194+1 = 89623195 in a 1-based coordinate system.
Now I've created an artificial VCF file:
##fileformat=VCFv4.1 #CHROM POS ID REF ALT QUAL FILTER 10 89623195 C A . PASS
1 SNP at position 89623195, the first nucleotide of the first exon. Then I use the following snpEff command to annotate that SNP:
java -Xmx4g -jar ~/snpEff_latest_core/snpEff/snpEff.jar GRCh37.p13.RefSeq artificial.vcf
The ANN record that I get for NM_000314 is the following:
I want to focus on the HGVS nomenclature of that record: c.-1032C>A. If you navigate to the NCBI page of NM_000314 you will see that the coding sequence starts at position 1032 of the transcript. If I had introduced a SNP at position 1031, the expected HGVS nomenclature would be c.-1C>A. If you follow that logic, my introduced SNP at the first nucleotide of exon1 must have the HGVS nomenclature c.-1031C>A. Which is not the case. Is there an error in the snpEff algorithm?