Question: Differences in exon number from Biopython Entrez Package and ANNOVAR
0
gravatar for gero.knittel
25 days ago by
gero.knittel10
gero.knittel10 wrote:

Hi everybody,

I have a strange problem, which for sure has something to do with the version of some reference that I'm using.

I'm counting the number of exons of the transcript NM_003752 in the following way:

from Bio import Entrez
Entrez.email = 'xxxxxx'

handle = Entrez.efetch(db='Nucleotide', id='NM_003752', retmode='xml')
record = Entrez.read(handle)
feature_count = len(record[0]['GBSeq_feature-table'])
exoncount = 0
for i in range(0, feature_count):
    if record[0]['GBSeq_feature-table'][i]['GBFeature_key'] == 'exon':
        exoncount += 1
print(exoncount)

Output is 21. Also, if I count manually on the NCBI website (https://www.ncbi.nlm.nih.gov/nuccore/557129041/), I count 21 exons.

However, if I run ANNOVAR using the following input file:

16  28391160    28391160    G   A

with the following command

annotate_variation.pl -out results -build hg19 annovar.avinput humandb/ --separate

I get this result:

line1   nonsynonymous SNV EIF3CL:NM_001317857:exon21:c.C2704T:p.R902C,EIF3C:NM_001286478:exon22:c.C2671T:p.R891C,EIF3CL:NM_001099661:exon21:c.C2704T:p.R902C,EIF3C:NM_001267574:exon21:c.C2701T:p.R901C,EIF3C:NM_001037808:exon22:c.C2701T:p.R901C,EIF3CL:NM_001317856:exon21:c.C2704T:p.R902C,EIF3C:NM_001199142:exon22:c.C2701T:p.R901C,EIF3C:NM_003752:exon22:c.C2701T:p.R901C,  16  28391160    28391160    G   A

So it says that for transcript NM_003752, the resulting mutation will be in exon 22, which it shouldn't even have.

What's the problem here?

Thanks very much for helping out!

Best, Gero

ADD COMMENTlink modified 25 days ago by Kevin Blighe23k • written 25 days ago by gero.knittel10
1

@gero: I removed your email address from the post.

ADD REPLYlink written 25 days ago by genomax51k

Thank you, that was not very considerate of me.

ADD REPLYlink written 25 days ago by gero.knittel10

gero.knittel : You should email ANNOVAR authors and let them know if this discrepancy.

ADD REPLYlink written 25 days ago by genomax51k
0
gravatar for Kevin Blighe
25 days ago by
Kevin Blighe23k
Republic of Ireland
Kevin Blighe23k wrote:

Hello and good evening,

ANNOVAR is correct. For that particular transcript isoform, NM_001286478, it does indeed have 22 exons. The 'split' occurs at exon 9 of the gene. In the following screenshot from the UCSC Genome Browser, this transcript is highlighted in blue (at left) and you can see that, in exon 9, a few bases are not transcribed in that and other transcripts:

f

[source: https://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtM...]

Kevin

ADD COMMENTlink written 25 days ago by Kevin Blighe23k

Kevin Blighe : OP is asking about NM_003752:exon22:c.C2701T:p.R901C which is at the end of that long line.

It could be an error with ANNOVAR's annotation correct?

ADD REPLYlink modified 25 days ago • written 25 days ago by genomax51k

Indeed, I was looking at the incorrect transcript. NM_001286478 (and a few other transcripts) have 22 exons, whereas NM_003752, in which Gero is interested, only has 21 exons in all transcripts that I have observed.

ADD REPLYlink written 25 days ago by Kevin Blighe23k

I have just checked my own version of ANNOVAR (Version: $Date: 2015-06-17 21:43:51 -0700 (Wed, 17 Jun 2015)), and it gives a different result for that transcript. It says exon 21, correctly:

line1   nonsynonymous SNV   EIF3C:NM_001037808:exon21:c.C2704T:p.R902C,EIF3CL:NM_001099661:exon21:c.C2704T:p.R902C,EIF3C:NM_001199142:exon21:c.C2704T:p.R902C,EIF3C:NM_001267574:exon20:c.C2704T:p.R902C,EIF3C:NM_001286478:exon21:c.C2674T:p.R892C,EIF3C:NM_003752:exon21:c.C2704T:p.R902C,    16  28391160    28391160    G   A

Which version of ANNOVAR are you using?

ADD REPLYlink written 25 days ago by Kevin Blighe23k

Hi Kevin,

I'm using the current version (Version: $Date: 2018-04-16 00:43:31 -0400 (Mon, 16 Apr 2018)).

ADD REPLYlink written 25 days ago by gero.knittel10

What is the source for the annotation by ANNOVAR? At ensembl I cannot find any transcript with 22 exons. They all have 21 or less.

I'v seen it at least once that the data from UCSC were wrong. This leads to a problem in a custom target resequencing kit we let design by Agilent. We gave them the transcript number we liked to capture and they design the probes. For one exon, in one gene the coordinates were wrong at the end, so we could not capture this one. Investigating on that problem it turned out that UCSC had the wrong coordinates for that exon.

fin swimmer

ADD REPLYlink modified 25 days ago • written 25 days ago by finswimmer3.6k

well, IGV (v 2.4.10) shows it's exon 22 for transcript (for both hg19 and b37) (screenshot attached). But UCSC table data for NM_003752 (for grch37/ hg19) lists only 21 exons. Probably, this discrepancy is due to incorrect transcript version in annotation, for that build.

Screenshot_at_2018_06_21_12_56_32

ADD REPLYlink modified 25 days ago • written 25 days ago by cpad01127.5k

Hm, I'm pretty sure this is some kind of version problem of the annotation source.

In my IGV (v 2.4.10) it's exon 21.

NM_003752

ADD REPLYlink modified 25 days ago • written 25 days ago by finswimmer3.6k

I think it is the problem with annotation source. I deleted earlier hg19.genome from IGV (genomes folder) and downloaded new one. This time, the transcript has 21 exons.

Screenshot_at_2018_06_21_14_46_49

ADD REPLYlink written 25 days ago by cpad01127.5k

In the UCSC screenshot that I showed, some also have 22 exons (I believe that they were 'UCSC genes'). Gene annotation is a constantly evolving process, as far as I understand.

ADD REPLYlink written 25 days ago by Kevin Blighe23k

Ladies and Gentlemen: what we are witnessing here is the complexity of the genome at play... transcription is pervasive and constant, and performed in ways we don't fully grasp in our daily research activities.

ADD REPLYlink written 25 days ago by Kevin Blighe23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1539 users visited in the last hour