Poly(A) Signal Prediction And Forward/Reverse Strand Conventions
2
2
Entering edit mode
10.7 years ago
Lídia ▴ 80

Hi!
It's my first time here and I really thank you all for the usefull content you submit. I read a couple of questions about forward and reverse convention and it really came as a grate relief to see that I wasn't the only person who was wondering about such a basic but sometimes confusing thing.
I'm investigating a gene which is in the reverse strand so, according to the discussions I've read, the forward strand given by the databases is the remplate or antisense strand. Am I right?
Then, the poly(A) signal that usually is 5'-AAUAAA-3' in mRNA appears as its complementary sequence in the forward strand: 5'-TTTATT-3'

forward strand/template strand (antisense): 5'-TTTATT-3'
reverse strand/coding strand (sense): 3'-AAATAA-5'
mRNA: 3'AAAUAA-5'
cDNA: 5'TTTATT-3'

What happens is that there are a 5'-TTTATT-3' and a 5'-AAATAA-3' sequences in the 3'UTR of this gene and I wasn't sure about which one could be the putative poly(A) signal.

Do you know if there is a program to predict the presence of regulatory sequences such as promoters, splicing sites, polyadenilation sites, etc. ?

Thank you very much in advanced!

DNA is double-stranded. By convention, for a reference chromosome, one whole strand is designated the "forward strand" and the other the "reverse strand". This designation is arbitrary. Sometimes the terms "plus strand" and "minus strand" are used instead.
*Visually (I'm not talking about the transcription machinery yet), you would typically read the sequence of a strand in the 5-3 direction. For the forward strand, this means reading left-to-right, and for the reverse strand it means right-to-left.
*A gene can live on a DNA strand in one of two orientations. The gene is said to have a coding strand (also known as its sense strand), and a template strand (also known as its antisense strand). For 50% of genes, its coding strand will correspond to the chromosome's forward strand, and for the other 50% it will correspond to the reverse strand.
The cDNA (and protein) sequence of a gene corresponds to the DNA sequence as read (again, visually) from the gene's coding strand. So the cDNA sequence always corresponds to the 5-3 coding sequence of a gene.
Now, the RNA polymerase machinery moves along the DNA in the 5-3 orientation of the coding strand (e.g. left-to-right for a forward strand gene). It reads the bases from the template strand (so it is reading in the 3-5 direction from the point-of-view of the template strand), and builds the cDNA as it goes. This means that the cDNA matches the coding sequence of the gene, not the template sequence. (This diagram from Wikipedia illustrates).
Annotations such as Ensembl and UCSC are concerned with the coding sequences of genes, so when they say a gene is on the forward strand, it means the gene's coding sequence is on the forward strand. To follow through again, that means that during transcription of this forward-strand gene, the gene's template sequence is read from the reverse strand, producing a cDNA that matches the sequence on the forward strand.
http://biostar.stackexchange.com/questions/3430/forward-and-reverse-strand-conventions

• 9.5k views
2
Entering edit mode
10.7 years ago

What happens is that there are a 5'-TTTATT-3' and a 5'-AAATAA-3' sequences in the 3'UTR of this gene and I wasn't sure about which one could be the putative poly(A) signal.

If the gene is annotated to be on the negative (-) strand, then the (canonical) poly(A) signal you are hunting for will be TTTATT in the genomic sequence.

Do you know if there is a program to predict the presence of regulatory sequences such as promoters, splicing sites, polyadenilation sites, etc. ?

Yes, lots.

In my opinion, the most successful methods to predict "promoters" are the more recent ones that go "beyond" just sequence and include data from different chromatin modification marks, which seem to be good landmarks for these things.

Fishing through google scholar for things like "predict splice sites," etc. will turn up many hits. Brendan Frey recently (2010) published a cool paper on deciphering the splicing code in Nature, and you can find him giving a talk about his work he gave at Microsoft Research if you do a little google hunting. There will be lots of citations to previous work there you can fish through as well.

Bin Tian has "polyA svm" for predicting pA signals, looking at those citations and papers that cite him would be a way to look for tools related to that.

0
Entering edit mode

Thank you, thank you, thank you!! :D I am a Biology student and, despite being very interested in bioinformatics, I don't know much about it! I'm very eager to learn more!

1
Entering edit mode
10.7 years ago
Arun 2.4k

I am a bioinformatician and by no means an expert. However, I'll try to the best of my understanding to provide an answer.

I'm investigating a gene which is in the reverse strand so, according to the discussions I've read, the forward strand given by the databases is the remplate or antisense strand. Am I right?

From my understanding, yes, its true. The strand from which the gene is transcribed is the coding (or sense) strand and the other is the template (or antisense) strand.

Then, the poly(A) signal that usually is 5'-AAUAAA-3' in mRNA appears as its complementary sequence in the forward strand: 5'-TTTATT-3'

This is not true, I guess. When a gene is transcribed, the introns are located by the splicing mechanism and spliced out while ligating the exons to form the mRNA (from the pre-mRNA). This happens more or less at the same stage. At this stage, a poly-A tail is ligated to the mRNA in order to prevent it from being degraded by (other) enzymes. In spite of that, mRNA could vary between stable to unstable. So, if you are looking at a FASTA file containing the reference genome, then you wouldn't see the poly-A or poly-T in the genome.

0
Entering edit mode

Thanks for your answer! I wasn't talking about de poly-A or poly-T but the consensus signal that is located 10-30 nucleotides upstream from it. When pre-mRNA is processed an enzyme cleaves the mRNA 10-30nt downstream from the poly(A) signal which usually is 5'-AAUAAA-3'. The sequence itself isn't relevant... What I want to understand is what happens with genes that are in the reverse strand. Are you seeing the antisense sequence of these genes when you are looking a FASTA file containing the reference genome?