Question: What Does A Zero Expression Level Mean In The Encode Rna-Seq Data?
pengcui198930 wrote:

Dear all,

I'm new in bioinformatics and not so familiar with RNA-Seq Data. So I have a very simple question about your RNA-Seq Data. For example, I have download the long polya+ RNA-Seq data from ENCODEenter link description here. It's in the view of Genes Gencode V7 and contains more than 50000 genes' expression level information (RPKM). I find that there are so many genes' expression level in 0. However, I'm confused about whether they are expressed or not. They may indeed not be expressed. Or they may be expressed a lot in polya-RNA or microRNA type and we don't extract them from a polya+ extraction. I think this problem is also remained in transcripts data. So I don't know how to use the ploya+ RNA-Seq data to identify the gene's expression level (about level 0 genes).

So who can give me a help? And thank you very much!

Are you calculating R/FPKM yourself from aligned reads (BAM files), or are you downloading some summarized version of the data? If it's the latter, could you share the link so that we can see what you're talking about?

OK, it is the latter. The link is ''. I also have edited my question.

Mikael Huss4.6k
Mikael Huss4.6k wrote:

As you point out, you will pick up mostly polyadenylated transcripts using this protocol, so it is not surprising that many Gencode genes don't show any expression - you would not expect any signal from non-poly-A transcripts or microRNA (because the latter needs a different protocol designed for short rather than long fragments). Probably, if you would check against RefSeq, you would have a higher proportion of non-zero RPKMs. Many transcripts are probably also genuinely unexpressed, since tissues don't express all transcripts available to them.

So how can we use RNA-Seq data if we can't figure out whether the gene has expressed or not?

We can, but it isn't a perfect experiment that will magically give us all of the answers to life's mysteries :) If we are talking about ploy-A transcripts it means we are interested in protein coding genes and not microRNAs. You can also do total RNA experiments and deplete the rRNA transcripts to look at mRNA and other RNA populations. So for the gene you want to ask the question about expression level first ask if you expect to see it in your data in the first place. If not, you are looking at the wrong data. If you do expect it, and see no reads, than it wasn't expressed in that experiment. Either it isn't normally expressed in that tissue/cell-type, or it was down-regulated/turned off/lost.

