Question: Selecting Rna Database With Polya
0
gravatar for camelbbs
6.1 years ago by
camelbbs650
China
camelbbs650 wrote:

I see many RNAseq library are prepared by polyA selection method, so while I analyze the differential gene expression, which annotation gene database is better to use. As I know, UCSC, RefSeq, GenCODE, etc has a list of RNA. But if I am comparing a library with polyA selection and another non-polyA selection library, I wonder if it is better to select a list of RNA with polyA as the annotation file. Thanks.

rnaseq mrna • 2.7k views
ADD COMMENTlink modified 6.1 years ago by Malachi Griffith17k • written 6.1 years ago by camelbbs650
3

Hi Camelbbs, I am sorry this is not really a question, also it doesn't make sense as statement and it sounds as if you are looking for an explanation and are confused about the process of transcription in eukaryotes. I will therefore delete your question, unless you clarify it.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Michael Dondrup46k

I changed it. Thanks. I will make my question more clear..

ADD REPLYlink written 6.1 years ago by camelbbs650

Ok, it is much better now. I think we had a similar question lately, I will look for it.

ADD REPLYlink written 6.1 years ago by Michael Dondrup46k
3

Hi Camelbbs. I'm adding this comment to all your questions: Please take some time, before you ask a question, to think more about your problems and most likely sources of answers (manuals, FAQs, Google!, etc.). When you ask a question, include some context, tell us why you ask that question, what result you need, etc. Most of your questions are vague, impossible to answer or you changed them following an answer because it became evident that it was not clear. Cheers.

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Eric Normandeau10k
1

Hi Eirc, Have you really read my questions before. Most of questions have been answered very well. I don't think my question are too vague to understand. Many questions are important in current research and I believe many people have the same questions as me. Maybe I don't write my question with detailed explanation. But if you are familiar with this area, you should know what I am asking. I don't see any answer from you but just the comments like this one...

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by camelbbs650
2

Well, there is no such thing as a "polyA gene" or a "polyA gene database". There are genes, which are transcribed into RNAs, some of which are then polyadenylated. So given that you have selected those transcripts as part of the RNA-seq protocol, you could then in principle use the sequence obtained to go back and query gene databases.

Is that along the lines of what you were trying to ask?

ADD REPLYlink written 6.1 years ago by Neilfws48k
4
gravatar for Malachi Griffith
6.1 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

Yes, many RNA-seq libraries are created by first performing a polyA+ selection of total RNA. This has the effect of enriching for transcripts that are polyadenylated and therefore assumed to be enriched for mRNAs. Remember that after transcription of an immature RNA by RNA polymerase and processing by the splicing machinery, transcripts are polyadenylated and exported from the nucleus to the cytoplasm before translation of proteins from the mRNA template can occur.

Since total RNA is 95-98% ribosomal RNA (rRNA), and rRNAs are NOT polyadenylated, and RNA sequencing involves random sequencing of fragments, polyA selection is one method of preventing the situation where one is mostly sequencing the rRNAs to incredible depth and obtaining sequence from almost nothing else.

It is fairly common to consider the polyA+ genes to be the same set as the protein coding mRNAs. For a variety of reasons, in RNA-seq analysis, people sometimes do focus on the subset of genes that are protein coding. In Ensembl you can obtain this set of genes by identifying those that have the 'transcript biotype' of 'protein coding'.

For example, you can use Ensembl BioMART, after selecting species and database, and setting a filter: Gene type -> protein_coding.

22719 of 62252 human genes in the latest version of Ensembl are protein coding. With perhaps a few exceptions, all of these should be polyadenylated.

You can also obtain GTF files from the Ensembl FTP server. Within these files, again things like 'gene_biotype' are defined. You can therefore easily limit to particular types of genes such as 'protein_coding', 'miRNA', 'lincRNA', etc.. In order to do that you will need to understand how GTF files work.

For reference to some of the terms above refer to the following diagram:

Gene expression diagram

ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by Malachi Griffith17k

Thanks so much Malachi. I understand that "It is fairly common to consider the polyA+ genes to be the same set as the protein coding mRNAs". So does that mean PolyA+ RNASeq could not be used to analyze the lncRNAs? I see some lncRNAs are also with PolyA, right? Can we use a complete RNA annotation including all RNA to be a gtf and then filter the result by rpkm>1, etc.?

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by camelbbs650
2
gravatar for k.nirmalraman
6.1 years ago by
k.nirmalraman980
Germany
k.nirmalraman980 wrote:

I suggest you read through this wikipedia link on Polyadenylation to re-evaluate the basis of your question.

ADD COMMENTlink written 6.1 years ago by k.nirmalraman980
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 954 users visited in the last hour