Question: TPM values of expressed genes
gravatar for Bogdan
17 months ago by
Palo Alto, CA, USA
Bogdan1000 wrote:

Dear all,

considering a RNA-seq experiment and analysis that provides the expression values as TPM, please would you let me know what is a minimum TPM value in order to consider a gene to be expressed ?

talking about RPKM.FPKM units, I remember that a gene was considered expressed if RPKM (or FPKM) > 1 ... thanks a lot,

-- bogdan

rna-seq • 4.0k views
ADD COMMENTlink modified 17 months ago by i.sudbery8.4k • written 17 months ago by Bogdan1000

Thank you very much for your comments and insights ;)

ADD REPLYlink written 17 months ago by Bogdan1000
gravatar for Kevin Blighe
17 months ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

I do not believe there is any definitive answer. There are so many factors that go into each experiment such that it is difficult to pick a value. A RPKM / FPKM value of 1 seems quite low to me, i.e., in 'error' territory. What you have to consider is the distribution of your data and the suitability of it for whatever downstream tools you will use. If including low-count / low expressed genes is going to distort your data distribution and introduce biases, then you need to remove them - check via histograms.

From RNA-seq, most genes are lowly expressed, possibly due to transcriptional 'noise' more than anything else. I say 'noise' in the knowing that they may reflect genuine transcription but have no regulatory function and are artifacts of other transcriptional processes that have occurred. They may also reflect regions where TF binding and/or promoter activity was weak.

So, you have the liberty to choose your own cut-off for TPM and state it in the methods. :)

Please take the time to read Gordon's answer, here:


Edit: another interesting discussion:

ADD COMMENTlink modified 17 months ago • written 17 months ago by Kevin Blighe63k

Naive cutoffs will probably miss lowly-expressed but important genes. See the last paragraph of this post of Obi Griffith --- How Much Coverage Do We Need For An Rna-Seq Experiment?

ADD REPLYlink modified 17 months ago • written 17 months ago by ATpoint36k
gravatar for igor
17 months ago by
United States
igor11k wrote:

As already pointed out, there is no ideal cutoff. However, there is at least one method, zFPKM, that tries to define an expression cutoff.



the community adopted several heuristics for RNA-seq analysis, most notably an arbitrary expression threshold of 0.3 - 1 FPKM for downstream analysis. However, advances in RNA-seq library preparation, sequencing technology, and informatic analysis have addressed many of the systemic sources of uncertainty and undermined the assumptions that drove the adoption of these heuristics. ... We use ENCODE data on chromatin state to show that ultralow-expression genes are predominantly associated with repressed chromatin; we provide a novel normalization metric, zFPKM, that identifies the threshold between active and background gene expression; and we show that this threshold is robust to experimental and analytical variations.

ADD COMMENTlink written 17 months ago by igor11k

I have been using zFPKM more and more in situations where I have encountered FPKM data. I believe I saw that you mentioned it in another post a few months back. Thanks Igor.

ADD REPLYlink written 17 months ago by Kevin Blighe63k
gravatar for i.sudbery
17 months ago by
Sheffield, UK
i.sudbery8.4k wrote:

There is no such thing as a cut off, because there is no such thing as not expressed - the whole genome is transcribed at some level in any given cell type. However, this doesn't stop us from sometimes needing to make a decision: which genes to include in an metagene analysis for example.

One a purely technical note, one could define a cut off as the point at which we can't distinguish between low expression levels and technical noise. It seems like zFPKM is doing something similar to this, but hand-rolled versions I've seen shift exons into nearby, but unexpressed, GC matched genome regions and then quantify them to get an average signal for strictly unexpressed sequence. I guess the difference between this and zFPKM is that zFKPM gives you a level for "unexpressed genes", where as this gives out a baseline level for "not genes".

A different approach is to think about what the level actually means. This is next to impossible with FPKM, but FPKM is readily translatable into TPM, and then the meaning is quite concrete. For example, if the average cell has 200,000 mRNA molecules in it at any one time, then a TPM of 5 would translate to 1 molecule per cell on average at any one time.

Finally, you could think in a distributional sense. I think in the last sample I looked at TPM 5 put you in the top 10% most highly expressed genes.

In the end it depends on what the purpose of the threshold is.

ADD COMMENTlink modified 17 months ago • written 17 months ago by i.sudbery8.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1150 users visited in the last hour