What is the first exon in negative strand genes?
2
1
Entering edit mode
8.8 years ago

Hello Biostars!

I would like to extract locations (starting point and ending point) for some regions characterizing genes.

Using Biomart, for example, I can extract for a gene:

  • Gene_Start_(bp)
  • Gene_End_(bp)
  • Transcription_Start_Site_(TSS)
  • Exon_Chr_Start_(bp)
  • Exon_Chr_End_(bp)
  • 5'_UTR_Start
  • 5'_UTR_End
  • 3'_UTR_Start
  • 3'_UTR_End

In Biomart, by convention, locations grow from left to right.

For a gene on the positive strand, it is quite trivial to find the first exon, and to compute the gene body (from the end of the first exon to the beginning of the rightmost 3'_UTR_Start).

But how to deal with genes on the negative strand?

I know that in nature, genes on the negative strand are transcribed from right to left.What should I consider as the first exon for this gene on the negative strand?

cheers

gene • 4.7k views
ADD COMMENT
3
Entering edit mode
8.8 years ago

Be careful not to confuse the coordinate system with the transcription direction. In "nature" all genes are transcribed in the same direction (5' to 3') relative to the template strand - nature does not know when it uses the positive strand or negative strand, it is all the same.

But since the coordinate system is relative to the positive strand the data obtained for the negative strand is often flipped. I say often as the jury is still out there what one ought to get back when they query for "Gene Start" in a database - Should they get the leftmost coordinate of a gene (that happens to be Gene Start on for genes on + strand and Gene End for genes on the - strand) or should they get the actual 5' end of the gene.

Basically you need to verify what your query returns and you may to either need reverse that order or not ... depending on what the query does.

ADD COMMENT
0
Entering edit mode

True. My experience out of the genome browser is, it returns the actual start of the exon, which I think is the right 5'. Quick question: What do you mean by "In "nature" all genes are transcribed in the same direction (5' to 3') relative to the template strand"?

ADD REPLY
0
Entering edit mode

I just quoted the word the original poster used "in nature"- trying to emphasize that when the transcription takes place there is no left and right. That only comes from what we chose as coordinate system.

(But then calling it just 5' and 3' in can turn also be confusing as now one needs to state 5' of what? The RNA is produced in 5' -> 3' but the polymerase traverses in 3' to 5')

ADD REPLY
0
Entering edit mode

Thank you - I did not know that RNA polymerase transcribes in a strand insensitive manner. How is the information passed on to ribosomes on the translation start sites?

ADD REPLY
0
Entering edit mode

Wait that is not what I meant to say! The transcribed sequence is always in 5' to 3'. What it does not do is go left to right or right to left, that is all that I meant.

If one were to obtain the sequence from the reverse strand then one would not need to go "backwards". It is only when we have a coordinate system relative to the forward strand that we need to keep track of reversing it.

ADD REPLY
0
Entering edit mode

OK, so polymerase always transcribes 5' to 3' on the relevant template strand (exon 1 to exon-n). Our co-ordinate system is based on one strand, which is where the confusion originates - on adapting indexes based on forward strand.

ADD REPLY
0
Entering edit mode

Many thanks for your answers,

To be more precise this is the situation, given these two genes:

chromosome   hgnc      start      end        strand
chr13        HMGA1P6   23708313   23708703   1
chr13        RNY3P4    23726725   23726825   -1

The question is from which direction these genes are transcripted (from the start to the end for +1. From the end to the start for -1)?

PS:the data was retrieved from Biomart;

Cheers!

ADD REPLY
1
Entering edit mode

strand = 1 is transcribed from 'start' to 'end'. strand=-1 is transcribed from 'end' to 'start'

ADD REPLY
0
Entering edit mode
8.8 years ago
Ram 43k

Exon 1 is always the exon that is transcribed first (the most 5'). This makes a lot of calculations tricky, but IIRC, the exon sequences themselves are usually retrieved the right way (right to left). You'll only need to manipulate the co-ordinates for any calculation.

ADD COMMENT

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6