Question: Help understanding the non-coding exons in a gene.
3
gravatar for Affan
5.0 years ago by
Affan290
Canada
Affan290 wrote:

I am currently reading a paper that makes a reference to the protein Aldolase A. More specifically, I am doing research on TF MEF2 binding sites, and the paper mentions that one binding site is located on the skeletal muscle-specific enhancer of Aldolase A. More specifically, the information is: Accession number X06351 with center 1985/1986.

So on NCBI, it leads me to this page: Accession X06351. Now, reading the pubmed abstract and going over the page there (keep in mind, I am not a biologist so I am not really sure what I am looking at half the time), I am under the impression that the sequence is only part of the actual gene for aldolase A and only shows non-coding exons (i thought exons are always coding). 

So that's got me confused. Now, to clear my understand, I went to the actual Aldolase A 365 AA protein page. This really dosnt tell me anything so I click the link at the start of the page that says

DBSOURCE    embl accession X12447.1

Clicking on this (which I presume leads me to the sequence that codes for this protein, correct?) leads me to this page.   Okay so I though this page includes the ENTIRE sequence that codes for the protein. So somewhere in there, it must be that  my original partial sequence (X06351) should be buried in there. But its not! So can someone explain what I am really looking at? What links did I actually follow?

An idea: I currently have an idea of my misunderstanding. If I recall correctly, the enhancer may be located relatively far away from the actual gene. Therefore, the DNA sequence from my accession X06351 may not actually be buried in the actual gene sequence.

A secondary question Is there a way to get to the enhancer/promoter regions for a gene from the NCBI page? For example, where can I get enhancer/promoter region information from the gene page of Aldolase A

sequencing gene • 11k views
ADD COMMENTlink modified 4 days ago by Biostar ♦♦ 20 • written 5.0 years ago by Affan290

"I thought exons are always coding (for-a-protein)": wrong.  One or more exon in a transcript can contain a 3' or 5' UTR (before the ATG / after the stop codon).

ADD REPLYlink written 5.0 years ago by Pierre Lindenbaum124k

You'll want to read the wikipedia pages on 5' and 3' UTRs. As a general rule, it's helpful to have taken a genomics class to easily  understand papers dealing with genomics.

ADD REPLYlink written 5.0 years ago by Devon Ryan93k
2
gravatar for Cyriac Kandoth
5.0 years ago by
Cyriac Kandoth5.4k
Memorial Sloan Kettering, New York, USA
Cyriac Kandoth5.4k wrote:

Here is a nice picture from Ensembl showing the relative positions of protein coding exons, introns, untranslated regions (UTRs), promoters, transcription factors, etc. An "exon" doesn't necessarily imply protein coding. In Eukaryotes, splicing can also happen in non-coding genes.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Cyriac Kandoth5.4k

Thanks, that's an awesome picture. But I am still looking for an answer. Is my "idea" correct in my post?

 

ADD REPLYlink written 5.0 years ago by Affan290
1

Sort of. The X06351 sequence is a fragment of chromosome 16. I'm guessing that in the paper you're reading they cloned this into something (a luciferase vector, or whatever else was appropriate at the time) and did promoter bashing. That sequence includes about 1.5kb upstream of one of the start site (from a long ago annotation) and the first few exons of one of the transcripts. So if you look up the mRNA or CDNA sequence, you won't find a perfect match. If you blast that sequence and then click on "graphics", much of your confusion should be cleared up.

For finding promoters and enhancers, it depends on whether they're annotated (they're usually not). Encode has some datasets that have helped with higher-volume enhancer prediction, but you never know with certainty without doing the needed molecular biology experiments if those regions are actually functional. In general, promoters are 1-4kb upstream to 100bp-1kb within a transcript. The numbers are vague, since there's a broad range and no formal definition.

ADD REPLYlink written 5.0 years ago by Devon Ryan93k

Thanks Devon.

  • So does that mean in general an entry for a gene say Aldolase A shows everything? ie, how do I read that ncbi page so I can find out what the upstream/downstream features are  (enhancers, promoter regions and so on). From the link, it seems you can.. however they seem to have promoter regions inside the gene... is that possible?
  • How can you tell that the sequence I posted is on chr16?
  • Also, I am still trying to understand what That sequence includes about 1.5kb upstream of one of the start site (from a long ago annotation) and the first few exons of one of the transcripts. So if you look up the mRNA or CDNA sequence, you won't find a perfect match. If you blast that sequence and then click on "graphics", much of your confusion should be cleared up. means

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Affan290
1

The gene ALDOA is on chr16 per the wiki link you sent. NCBI accession X06351 is dated March 1991, back in the days of PCR and Sanger sequencing, when labs focused on sequencing and studying one gene at a time, well before the Human Genome Project. But kids these days get to use genome browsers from UCSC or Ensembl with all the genes neatly mapped to a human reference sequence.

In Ensembl's mapping of ALDOA on the GRCh38 reference, take a look at the track named "Regulatory Features", to get an idea of where these promoter sequences lie, relative to the various isoforms of ALDOA.

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Cyriac Kandoth5.4k

As Cyriac said, have a look at the Ensembl regulatory features track. That'll often be more helpful than the NCBI entry.

I just blasted the sequence, that's how I knew where it aligned. If you're not familiar with blast, you'll want to give it a try...it's a core tool in biology and bioinformatics.

I don't know what you don't understand about the sentence you mentioned. Just blast the sequence and it should be clear what that means.

ADD REPLYlink written 5.0 years ago by Devon Ryan93k

Thanks for that pictorial representation. Just to clarify, here the blue diamond is the distal enhancer in the genomic context of where the mature transcript (in orange) comes from, right?

ADD REPLYlink written 4.0 years ago by shrutiisarda0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1757 users visited in the last hour