Question: Coordinate Mismatch For Exons From Ucsc Genome
0
gravatar for Max
6.4 years ago by
Max130
Max130 wrote:

I have been using UCSC's genome browser to extract exon sequences and their coordinates. One problem that I've encountered is that in the annotation tables (including exonFrames, etc), the exonStarts/exonEnds are close to those found in the sequences, but not precisely.

For instance, notice that the start (top) and (bottom) positions for exons below are close to, but don't quite match, those for the sequence data:

name    chrom    strand    exonStarts    exonEnds    exonFrames
NM_032291    chr1    +    66999824,67091529,67098752,67101626,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755,    67000051,67091593,67098777,67101698,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67210768,    0,1,2,0,0,0,1,0,0,0,1,2,1,1,1,1,0,1,1,2,2,0,2,1,1,

.

>hg19_refGene_NM_032291_0 range=chr1:67000042-67000051 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ATGATGGAAG
>hg19_refGene_NM_032291_1 range=chr1:67091530-67091593 5'pad=0 3'pad=0 strand=+ repeatMasking=none
GATTGAAAAAACGTACAAGGAAGGCCTTTGGAATACGGAAGAAAGAAAAG
GACACTGATTCTAC
>hg19_refGene_NM_032291_2 range=chr1:67098753-67098777 5'pad=0 3'pad=0 strand=+ repeatMasking=none
AGGTTCACCAGATAGAGATGGAATT
>hg19_refGene_NM_032291_3 range=chr1:67101627-67101698 5'pad=0 3'pad=0 strand=+ repeatMasking=none
CAGCCCAGCCCACACGAACCACCCTACAATAGCAAAGCAGAGTGTGCGCG
TGAAGGAGGAAAAAAAGTTTCG

Unfortunately, the sequences and the data tables have to be called separately, so I'm left with trying to resolve the matter from the conflicting data that I have.

exon coordinates ucsc • 1.7k views
ADD COMMENTlink modified 5.4 years ago by Biostar ♦♦ 20 • written 6.4 years ago by Max130

what should we see in your example ? where is the problem ?

ADD REPLYlink written 6.4 years ago by Pierre Lindenbaum124k

To give a specific example, the first exon from the list is:

hg19_refGene_NM_032291_0 range=chr1:66999825-67000051 5'pad=0 3'pad=0 strand=+ repeatMasking=none TTTCTCTCAGCATCTTCTTGGTAGCCTGCCTGTAGGTGAAGAAGCACCAG CAGCATCCATGGCCTGTCTTTTGGCTTAACACTTATCTCCTTTGGCTTTG ACAGCGGACGGAATAGACCTCAGCAGCGGCGTGGTGAGGACTTAGCTGGG ACCTGGAATCGTATCCTCCTGTGTTTTTTCAGACTCCTTGGAAATTAAGG AATGCAATTCTGCCACCATGATGGAAG

Now, the coordinates that are given for this exon are: 66999824 (START), 67000051 (STOP), while the exonFrame is 0.

The first issue is the mismatch beween the start positions (by a single nucleotide, though for other exons the mismatch can be by 2 or more). The second issue is how to interpret the exon frame variable. If the exonFrame variable is 0, is this with respect to the entire exon, or just with respect to the CDS region?

ADD REPLYlink written 6.4 years ago by Max130

It is a bit hard to follow exactly what you are doing here... Can you provide the precise queries you are performing to obtain these data? Are you doing this manually in the browser or using an SQL query? What is your ultimate goal?

ADD REPLYlink written 6.4 years ago by Malachi Griffith18k

I've been working manually with the browser.

Basically, I need the following information: coding exon sequence (excluding 5' and 3'UTR) coordinates of coding exon sequence reading frame (0,1,2) of exon sequence.

Is there some way of obtaining this information with a single query?

ADD REPLYlink written 6.4 years ago by Max130

Is this question still relevant? If you have solved it, you should have uploaded your own answer. Besides it isn't clear what is being asked here. Vote for closing.

ADD REPLYlink written 5.4 years ago by Hugues250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1456 users visited in the last hour