Does TSS = first exon?
3
6
Entering edit mode
7.3 years ago
NHEJ ▴ 330

Could anyone please clarify whether or not in computational studies one can treat the transcription start site (TSS) as being equivalent to the first exon?

Some seemingly contradictory quotes (and sources) that I've found on this issue:

"Promoter sequences are usually the sequence immediately upstream the transcription start site (TSS) or first exon." (SOURCE: http://www.protocol-online.org/forums/blog/4/entry-10-from-how-to-find-promoter-sequences/)

"The TSS is the first nucleotide of the UTR (at least I think so, I don't think there's any gene which immediately begins with ATG), so yes, UTRs can also be 'relaxed' and differ in length." (SOURCE: http://www.protocol-online.org/forums/index.php?app=forums&module=forums§ion=printtopic&client=printer&f=1&t=6050)

"No, the first codon of the first exon is the start codon "ATG" which also codes for methionine. This is called the translation start site.  The transcription start site is where the RNA polymerase binds to in the 5' UTR upstream of the start codon. IMHO  Maybe someone else can elaborate more. I dont want to give you the incorrect info."  (SOURCE: http://seqanswers.com/forums/archive/index.php/t-12773.html)

transcription start sites exon • 28k views
6
Entering edit mode
7.3 years ago

These terms always need to be defined with respect of an authoritative resource and not from posts on websites.

This is why the Sequence Ontology exists:

http://www.sequenceontology.org/index.html

For TSS it states the following:

http://www.sequenceontology.org/browser/current_svn/term/SO:0000315

there you can investigate the proper context.

0
Entering edit mode

Could you please elaborate on Steve Lianoglou's comment below?

0
Entering edit mode

That comment was attached to my answer which I should not have posted whilst half-asleep :) and have since deleted. I was thinking exclusively about protein-coding genes, in which the first exon is the translational start of the protein. However, exon can also be defined as "what's left in the mature RNA after splicing" and may have nothing to do with protein coding. In any case, TSS is not equivalent.

1
Entering edit mode

Sorry to be a mosquito, but I'll still argue that even in protein coding genes, the first exon is not defined by the translation start. An exon is (and should only ever be) defined by "what's left in the mature RNA after splicing", and (for instance) the "spliced bits" in the 5'UTR of the human ALAS1 gene are still called "exons."

Exons are defined by the splicing machinery RNA processing machinery, not the translational machinery.

I've struck through "splicing machinery" because we have cases like XBP1, which is post-transcriptionally processed by ERN1 that splits one exon, into two -- and I don't think anyone would call ERN1 part of any splicing machinery.

All that having been said -- do you have any references where people are actually going by the definition you are proposing?

0
Entering edit mode

I'm not proposing a definition, I'm using sloppy language which you are doing a great job of making less sloppy. I agree, exons are not "defined" by translational machinery. I guess I was trying to keep things simple in the context of the original question, to which the answer is "no, TSS is not first exon".

P.S. in my day, i.e. about 20 years ago, we remembered which were introns and which were exons by "exons are expressed". This newfangled definition of "exons are what's left after splicing" does not sit well in my old brain at all :)

0
Entering edit mode

>to which the answer is "no, TSS is not first exon".

Please give an example in which the TSS is not the start of the first exon. I have shown below that for all the major mouse annotations TSS == start of first exon. I would say biochemically the reason for this, at least for protein coding mRNAs, is the 5' cap which gets attached at transcription initiation to the 5' end and is necessary for export, translation (splicing?) etc...

0
Entering edit mode

circular RNAs might be a counter example

0
Entering edit mode

How I wish I could turn back time and start again with my answers; it appears I'm just confusing myself and everyone else :)

OK: if we are including UTRs in the definition of exon and if we're assuming that transcript starts in the UCSC database really are transcript starts (I have always wondered how many are experimentally-determined) then yes - the TSS is equivalent to the first position in the first exon.

If you're an old fart like me who was taught that exon = expressed region - which is now incorrect - then TSS is nothing to do with exons at all.

I hope this helps.

0
Entering edit mode

The sequence ontology also defines the exon as:

A region of the transcript sequence within a gene which is not removed from the primary RNA transcript by RNA splicing.

But of course one thing I learned in biology that there are always exceptions as Steve Lianoglou points out.

0
Entering edit mode

How does one computationally find the TSS of a gene?  What if there are multiple genes involved and you must resort to computational measures (not experimental measures like RACE)

0
Entering edit mode

So far as I know, whilst there are computational methods for TSS prediction (which you can find by web/literature search), only experimental methods such as RACE provide this information.

0
Entering edit mode
7.3 years ago
Ido Tamir 5.2k
mysql --user=mm9 --host=genome-mysql.cse.ucsc.edu -A

mysql>use mm9
mysql> select COUNT(*) from refGene where txStart != substring_index(exonStarts, ",", 1);
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.25 sec)

mysql> select COUNT(*) from refGene where txStart = substring_index(exonStarts, ",", 1);
+----------+
| COUNT(*) |
+----------+
|    34025 |
+----------+
1 row in set (0.33 sec)

mysql> select COUNT(*) from refGene;
+----------+
| COUNT(*) |
+----------+
|    34025 |
+----------+
1 row in set (0.18 sec)

mysql> select COUNT(*) from ensGene where txStart != substring_index(exonStarts, ",", 1);
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.36 sec)

mysql> select COUNT(*) from knownGene  where txStart != substring_index(exonStarts, ",", 1);
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.31 sec)

CAVEAT: direction - !