I have a strange problem, which for sure has something to do with the version of some reference that I'm using.
I'm counting the number of exons of the transcript NM_003752 in the following way:
from Bio import Entrez Entrez.email = 'xxxxxx' handle = Entrez.efetch(db='Nucleotide', id='NM_003752', retmode='xml') record = Entrez.read(handle) feature_count = len(record['GBSeq_feature-table']) exoncount = 0 for i in range(0, feature_count): if record['GBSeq_feature-table'][i]['GBFeature_key'] == 'exon': exoncount += 1 print(exoncount)
Output is 21. Also, if I count manually on the NCBI website (https://www.ncbi.nlm.nih.gov/nuccore/557129041/), I count 21 exons.
However, if I run ANNOVAR using the following input file:
16 28391160 28391160 G A
with the following command
annotate_variation.pl -out results -build hg19 annovar.avinput humandb/ --separate
I get this result:
line1 nonsynonymous SNV EIF3CL:NM_001317857:exon21:c.C2704T:p.R902C,EIF3C:NM_001286478:exon22:c.C2671T:p.R891C,EIF3CL:NM_001099661:exon21:c.C2704T:p.R902C,EIF3C:NM_001267574:exon21:c.C2701T:p.R901C,EIF3C:NM_001037808:exon22:c.C2701T:p.R901C,EIF3CL:NM_001317856:exon21:c.C2704T:p.R902C,EIF3C:NM_001199142:exon22:c.C2701T:p.R901C,EIF3C:NM_003752:exon22:c.C2701T:p.R901C, 16 28391160 28391160 G A
So it says that for transcript NM_003752, the resulting mutation will be in exon 22, which it shouldn't even have.
What's the problem here?
Thanks very much for helping out!