Question: Obtaining only coding exons from UCSC table browser
0
gravatar for christylynn002
5.0 years ago by
United States
christylynn0020 wrote:

Hi all. I'm having trouble producing a file that contains only the coding exons that do not contain UTR's. I've obtained a file from UCSC table browser that looks something like this:

#name    cdsStart    cdsEnd    exonCount    exonStarts    exonEnds
NM_017436    43088895    43089957    3    43088126,43091496,43116802,    43090003,43091637,43116876,
NM_001173466    53701272    53715249    15    53701239,53701628,53701835,53702065,53702218,53702508,53702743,53702940,53703384,53708081,53708877,53709118,53709510,53714348,53715126,    53701497,53701713,53701917,53702133,53702312,53702599,53702804,53703065,53703505,53708225,53708924,53709210,53709566,53714476,53715412,

I have the cdsStart and cdsEnd but what I want to do is to incorporate those starts and ends into the exonStarts and exonEnds so I can use this file for further analysis. For example, this is what I would want my output to look like:

#name    cdsStart    cdsEnd    exonCount    exonStarts    exonEnds
NM_017436    43088895    43089957    3    *43088895*,    *43089957*, 

For this example, the cdsStart and cdsEnds were in the first exon and thus I only wanted these exons to appear in my file. Is there any easy way to to carry this out from the table browser or do I need to modify the file? If so, any suggestions on how to do that?

Thank you!

 

 

python • 1.2k views
ADD COMMENTlink modified 5.0 years ago by Jorge Amigo11k • written 5.0 years ago by christylynn0020
2
gravatar for Jorge Amigo
5.0 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

if you select the gene track of interest, ask for a particular region (say chr21:33031597-33041570), request the output to be in BED format, and in the next page check "exons plus 0 bases at each end", you'll end up with this format that may be what you're looking for:

chr21	33031934	33032154	NM_000454_exon_0_0_chr21_33031935_f	0	+
chr21	33036102	33036199	NM_000454_exon_1_0_chr21_33036103_f	0	+
chr21	33038761	33038831	NM_000454_exon_2_0_chr21_33038762_f	0	+
chr21	33039570	33039688	NM_000454_exon_3_0_chr21_33039571_f	0	+
chr21	33040783	33041243	NM_000454_exon_4_0_chr21_33040784_f	0	+

 

ADD COMMENTlink written 5.0 years ago by Jorge Amigo11k

This is the best solution if you don't need to bulk process data using MySQL.

ADD REPLYlink written 5.0 years ago by RamRS27k

Does this give you every individual exon for the region? Are they only coding exons?

ADD REPLYlink written 5.0 years ago by christylynn0020

there's a option to select "coding exons" instead of "exons plus X bases at each end"

ADD REPLYlink written 5.0 years ago by Jorge Amigo11k

Perfect. Thank you!

ADD REPLYlink written 5.0 years ago by christylynn0020
0
gravatar for RamRS
5.0 years ago by
RamRS27k
Houston, TX
RamRS27k wrote:

Read each line into an object with exonStarts and exonEnds as arrays. Replace the first element of the exonStarts array with the cdsStart value and the last element of the exonEnds array with the cdsEnd value.

What's curious is that this was a program one of my friends had to write as part of an interview. Is that the case with you as well?

ADD COMMENTlink written 5.0 years ago by RamRS27k

I've been able to switch the first and last values like you proposed, but the problem is that the cdsStart and cdsEnd is not necessarily in the first or last exons. 

haha. Not for an interview. Just an intermediate step for further analysis 

ADD REPLYlink written 5.0 years ago by christylynn0020

Oh, I forgot that cases exist where entire exons can be UTRs. To address the worst case scenario, you can process all exons.

Compare each exonEnd to cdsStart and cdsEnd. If exonStarts[I]>cdsStart, exonStarts[i-1]=cdsStart. You will encounter this first, so stop checking for cdsStart after you assign cdsStart to the right exonStart (maybe set a flag). Similarly, if exonEnds[I]>cdsEnd, exonEnds[i]=cdsEnd. Exit loop.

This will work as long as you're processing both exonStarts and exonEnds arrays simultaneously.

ADD REPLYlink modified 8 months ago • written 5.0 years ago by RamRS27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1417 users visited in the last hour