Exons Associated With Multiple "Genes" In Ucsc Genome Data
Entering edit mode
9.3 years ago
Max ▴ 140

When I retrieve a complete exon list from the UCSC Genome refseq genes (human reference genome), I often find a single exon (same coordinates defined by chromosome number and start/end nucleotides) listed multiple times, i.e. with multiple genes defined by NM_ numbers. Is this to be expected? I realize that are instances of exons that are shared across multiple "genes," but there seem to be far too many instances of this in the sequence list and data tables to be due to actual shared exons alone.

exon ucsc genome • 2.8k views
Entering edit mode
9.3 years ago
Geparada ★ 1.5k

Hi Max,

First of all the "NM_xxx" are transcript annotations, not genes. One gene could have multiples annotated transcripts (isoforms) due to alternative splicing of gene. So, yes, is totally expected that if you extract the exon from any mammalian transcript annotation (RefSeq, USCS Genes, Gencode), you will have exon listed multiple times.

Now, the number of times that every exon is listed MUST be equal to the number of transcripts that have this exon. I recommend to you look a particular exon and count the number of times that is listed and the transcripts that contain this exon (you can do it just viewing the genome browser at the exon coordinates). If the number don't match, it's mean that your method for extract exons for transcript annotations have a bug.

Entering edit mode

Thanks. I'll have to check to see if there's a match with the number of transcripts or not.


Login before adding your answer.

Traffic: 1366 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6