Coordinate Mismatch For Exons From Ucsc Genome
0
0
Entering edit mode
10.7 years ago
Max ▴ 150

I have been using UCSC's genome browser to extract exon sequences and their coordinates. One problem that I've encountered is that in the annotation tables (including exonFrames, etc), the exonStarts/exonEnds are close to those found in the sequences, but not precisely.

For instance, notice that the start (top) and (bottom) positions for exons below are close to, but don't quite match, those for the sequence data:

name    chrom    strand    exonStarts    exonEnds    exonFrames
NM_032291    chr1    +    66999824,67091529,67098752,67101626,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755,    67000051,67091593,67098777,67101698,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67210768,    0,1,2,0,0,0,1,0,0,0,1,2,1,1,1,1,0,1,1,2,2,0,2,1,1,

.

>hg19_refGene_NM_032291_0 range=chr1:67000042-67000051 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATGATGGAAG
>hg19_refGene_NM_032291_1 range=chr1:67091530-67091593 5'pad=0 3'pad=0 strand=+ repeatMasking=none
GATTGAAAAAACGTACAAGGAAGGCCTTTGGAATACGGAAGAAAGAAAAG
GACACTGATTCTAC
>hg19_refGene_NM_032291_2 range=chr1:67098753-67098777 5'pad=0 3'pad=0 strand=+ repeatMasking=none
AGGTTCACCAGATAGAGATGGAATT
>hg19_refGene_NM_032291_3 range=chr1:67101627-67101698 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CAGCCCAGCCCACACGAACCACCCTACAATAGCAAAGCAGAGTGTGCGCG
TGAAGGAGGAAAAAAAGTTTCG

Unfortunately, the sequences and the data tables have to be called separately, so I'm left with trying to resolve the matter from the conflicting data that I have.

ucsc exon coordinates • 2.7k views
ADD COMMENT
0
Entering edit mode

what should we see in your example ? where is the problem ?

ADD REPLY
0
Entering edit mode

To give a specific example, the first exon from the list is:

hg19_refGene_NM_032291_0 range=chr1:66999825-67000051 5'pad=0 3'pad=0 strand=+ repeatMasking=none TTTCTCTCAGCATCTTCTTGGTAGCCTGCCTGTAGGTGAAGAAGCACCAG CAGCATCCATGGCCTGTCTTTTGGCTTAACACTTATCTCCTTTGGCTTTG ACAGCGGACGGAATAGACCTCAGCAGCGGCGTGGTGAGGACTTAGCTGGG ACCTGGAATCGTATCCTCCTGTGTTTTTTCAGACTCCTTGGAAATTAAGG AATGCAATTCTGCCACCATGATGGAAG

Now, the coordinates that are given for this exon are: 66999824 (START), 67000051 (STOP), while the exonFrame is 0.

The first issue is the mismatch beween the start positions (by a single nucleotide, though for other exons the mismatch can be by 2 or more). The second issue is how to interpret the exon frame variable. If the exonFrame variable is 0, is this with respect to the entire exon, or just with respect to the CDS region?

ADD REPLY
0
Entering edit mode

It is a bit hard to follow exactly what you are doing here... Can you provide the precise queries you are performing to obtain these data? Are you doing this manually in the browser or using an SQL query? What is your ultimate goal?

ADD REPLY
0
Entering edit mode

I've been working manually with the browser.

Basically, I need the following information: coding exon sequence (excluding 5' and 3'UTR) coordinates of coding exon sequence reading frame (0,1,2) of exon sequence.

Is there some way of obtaining this information with a single query?

ADD REPLY
0
Entering edit mode

Is this question still relevant? If you have solved it, you should have uploaded your own answer. Besides it isn't clear what is being asked here. Vote for closing.

ADD REPLY

Login before adding your answer.

Traffic: 2611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6