Coordinate Mismatch For Exons From Ucsc Genome
0
0
Entering edit mode
9.7 years ago
Max ▴ 140

I have been using UCSC's genome browser to extract exon sequences and their coordinates. One problem that I've encountered is that in the annotation tables (including exonFrames, etc), the exonStarts/exonEnds are close to those found in the sequences, but not precisely.

For instance, notice that the start (top) and (bottom) positions for exons below are close to, but don't quite match, those for the sequence data:

name    chrom    strand    exonStarts    exonEnds    exonFrames
NM_032291    chr1    +    66999824,67091529,67098752,67101626,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755,    67000051,67091593,67098777,67101698,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67210768,    0,1,2,0,0,0,1,0,0,0,1,2,1,1,1,1,0,1,1,2,2,0,2,1,1,


.

>hg19_refGene_NM_032291_0 range=chr1:67000042-67000051 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATGATGGAAG
GATTGAAAAAACGTACAAGGAAGGCCTTTGGAATACGGAAGAAAGAAAAG
GACACTGATTCTAC
AGGTTCACCAGATAGAGATGGAATT
CAGCCCAGCCCACACGAACCACCCTACAATAGCAAAGCAGAGTGTGCGCG
TGAAGGAGGAAAAAAAGTTTCG


Unfortunately, the sequences and the data tables have to be called separately, so I'm left with trying to resolve the matter from the conflicting data that I have.

ucsc exon coordinates • 2.5k views
0
Entering edit mode

what should we see in your example ? where is the problem ?

0
Entering edit mode

To give a specific example, the first exon from the list is:

Now, the coordinates that are given for this exon are: 66999824 (START), 67000051 (STOP), while the exonFrame is 0.

The first issue is the mismatch beween the start positions (by a single nucleotide, though for other exons the mismatch can be by 2 or more). The second issue is how to interpret the exon frame variable. If the exonFrame variable is 0, is this with respect to the entire exon, or just with respect to the CDS region?

0
Entering edit mode

It is a bit hard to follow exactly what you are doing here... Can you provide the precise queries you are performing to obtain these data? Are you doing this manually in the browser or using an SQL query? What is your ultimate goal?

0
Entering edit mode

I've been working manually with the browser.

Basically, I need the following information: coding exon sequence (excluding 5' and 3'UTR) coordinates of coding exon sequence reading frame (0,1,2) of exon sequence.

Is there some way of obtaining this information with a single query?

0
Entering edit mode

Is this question still relevant? If you have solved it, you should have uploaded your own answer. Besides it isn't clear what is being asked here. Vote for closing.