Question

Selecting Rna Database With Polya

0

Entering edit mode

12.1 years ago

camelbbs ▴ 710

I see many RNAseq library are prepared by polyA selection method, so while I analyze the differential gene expression, which annotation gene database is better to use. As I know, UCSC, RefSeq, GenCODE, etc has a list of RNA. But if I am comparing a library with polyA selection and another non-polyA selection library, I wonder if it is better to select a list of RNA with polyA as the annotation file. Thanks.

rnaseq mrna • 5.7k views

ADD COMMENT • link updated 12.1 years ago by Malachi Griffith 20k • written 12.1 years ago by camelbbs ▴ 710

3

Entering edit mode

Hi Camelbbs, I am sorry this is not really a question, also it doesn't make sense as statement and it sounds as if you are looking for an explanation and are confused about the process of transcription in eukaryotes. I will therefore delete your question, unless you clarify it.

ADD REPLY • link 12.1 years ago by Michael 56k

0

Entering edit mode

I changed it. Thanks. I will make my question more clear..

ADD REPLY • link 12.1 years ago by camelbbs ▴ 710

0

Entering edit mode

Ok, it is much better now. I think we had a similar question lately, I will look for it.

ADD REPLY • link 12.1 years ago by Michael 56k

3

Entering edit mode

Hi Camelbbs. I'm adding this comment to all your questions: Please take some time, before you ask a question, to think more about your problems and most likely sources of answers (manuals, FAQs, Google!, etc.). When you ask a question, include some context, tell us why you ask that question, what result you need, etc. Most of your questions are vague, impossible to answer or you changed them following an answer because it became evident that it was not clear. Cheers.

ADD REPLY • link 12.1 years ago by Eric Normandeau 11k

1

Entering edit mode

Hi Eirc, Have you really read my questions before. Most of questions have been answered very well. I don't think my question are too vague to understand. Many questions are important in current research and I believe many people have the same questions as me. Maybe I don't write my question with detailed explanation. But if you are familiar with this area, you should know what I am asking. I don't see any answer from you but just the comments like this one...

ADD REPLY • link 12.1 years ago by camelbbs ▴ 710

2

Entering edit mode

Well, there is no such thing as a "polyA gene" or a "polyA gene database". There are genes, which are transcribed into RNAs, some of which are then polyadenylated. So given that you have selected those transcripts as part of the RNA-seq protocol, you could then in principle use the sequence obtained to go back and query gene databases.

Is that along the lines of what you were trying to ask?

ADD REPLY • link 12.1 years ago by Neilfws 49k

score 4 · Answer 1 · 2013-05-31

Yes, many RNA-seq libraries are created by first performing a polyA+ selection of total RNA. This has the effect of enriching for transcripts that are polyadenylated and therefore assumed to be enriched for mRNAs. Remember that after transcription of an immature RNA by RNA polymerase and processing by the splicing machinery, transcripts are polyadenylated and exported from the nucleus to the cytoplasm before translation of proteins from the mRNA template can occur.

Since total RNA is 95-98% ribosomal RNA (rRNA), and rRNAs are NOT polyadenylated, and RNA sequencing involves random sequencing of fragments, polyA selection is one method of preventing the situation where one is mostly sequencing the rRNAs to incredible depth and obtaining sequence from almost nothing else.

It is fairly common to consider the polyA+ genes to be the same set as the protein coding mRNAs. For a variety of reasons, in RNA-seq analysis, people sometimes do focus on the subset of genes that are protein coding. In Ensembl you can obtain this set of genes by identifying those that have the 'transcript biotype' of 'protein coding'.

For example, you can use Ensembl BioMART, after selecting species and database, and setting a filter: Gene type -> protein_coding.

22719 of 62252 human genes in the latest version of Ensembl are protein coding. With perhaps a few exceptions, all of these should be polyadenylated.

You can also obtain GTF files from the Ensembl FTP server. Within these files, again things like 'gene_biotype' are defined. You can therefore easily limit to particular types of genes such as 'protein_coding', 'miRNA', 'lincRNA', etc.. In order to do that you will need to understand how GTF files work.

For reference to some of the terms above refer to the following diagram:

Gene expression diagram

score 2 · Answer 2 · 2013-05-30

2

Entering edit mode

12.1 years ago

k.nirmalraman ★ 1.1k

I suggest you read through this wikipedia link on Polyadenylation to re-evaluate the basis of your question.

ADD COMMENT • link 12.1 years ago by k.nirmalraman ★ 1.1k