Question: p-values for log fold change in RPKM
gravatar for peter pfand
6.4 years ago by
peter pfand100
peter pfand100 wrote:


I have calculated the log fold change values for RNA-Seq Data and would like to estimate the significance of the results. I know DESeq does it already, but I want to do it manually after having normalised the counts with RPKM.

Some ideas?


Thanks in advance

ADD COMMENTlink modified 5.4 years ago by Biostar ♦♦ 20 • written 6.4 years ago by peter pfand100
gravatar for Devon Ryan
6.4 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

Assuming you've truly done all of the required normalization, then you could just use a T-test or ANOVA (or other applicable linear model). Remember that you'll have lower power than a method like DESeq2 or edgeR since you'll not be using information sharing, but that's the simple manual route.

BTW, why do you want to do this? The various count-based packages are pretty nice and it's usually not a good idea to reinvent the wheel unless you have a good reason.

ADD COMMENTlink written 6.4 years ago by Devon Ryan98k

I just want to compare different methods for my data, because the log fold change expression distribution is shifted in the case of RPKM, but in my case it has sense (it looks a bit strange that all log fold change values are centered around 0, when there is a gene in my case that turn off all expression in the cell).

The output I want to get is the p-values for every gene after the log fold change, just like with DESeq.


ADD REPLYlink written 6.4 years ago by peter pfand100

Hey Devon,

If you have a log2(F.change) for each gene, T-test or anova gives the overall p-value of the library (population). So, what will you suggest if you want to assign p-value for each gene pair on wt/ko, which could tell us if the F.change is significant or not.

Thanks !

ADD REPLYlink written 6.4 years ago by Chirag Nepal2.3k

The T-test or ANOVA will give the per-gene p-values, since you're testing by gene (not directly comparing columns of genes from two samples against each other). In cases with no replicates, there are no really meaningful p-values possible (the best you can do is use something like GFold).

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Devon Ryan98k

Even if there are replicates, t-test is not applicable unless you have many replicates (~10X2).

In most cases people do up to 3 replicates, let's assume that the increase or decrease of gene X is random, if in all 3 cases the gene expression was increased it's like getting 3 heads in a row, 1/8.

I think that most of the power of DESeq or cuffcompare (and my understanding of these tools is poor) is determining if the expression was increased or decreased in an experiment, i.e. if the number of mRNA molecules of gene X were different in the two conditions,  this doesn't mean that the next time you'll run the experiment it will (most probably) happen again.

ADD REPLYlink written 6.4 years ago by Asaf8.5k

You don't need ~10 samples per compared group to use a T-test, that's simply non-sense as a general statement. In the special case of gene expression data that's certainly true and of course even then your power is going to be terrible compared to DESeq/edgeR/etc., but that wasn't the question posed (and I made reference to the power issue anyway).

ADD REPLYlink written 6.4 years ago by Devon Ryan98k

If you only have the fold-change values you most definitely need more than 10 replicates for the suggested t-test to be applicable if you test each gene independently. I know that people do t-tests of triplicates but that's just non-sense. 

ADD REPLYlink written 6.4 years ago by Asaf8.5k

Agreed. Note that I was replying to needing ~10 samples per group as a general requirement, not one specific to gene-expression.

ADD REPLYlink written 6.4 years ago by Devon Ryan98k

In this tutorial ( they applied deseq with 2-3 replicates for 2 conditions. My question is: how could I do the same but with the log2foldchange values in RPKM?

ADD REPLYlink written 6.4 years ago by peter pfand100

A log2(foldchange) in an RPKM doesn't make any sense (that's like saying you percentage changes stored in apples). I assume you have RPKMs for two groups and want to compare them. You can use a T-test, but as mentioned above the results won't be worth much. You're better off either not using RPKMs or using something like cuffdiff, which has somewhat different requirements.

ADD REPLYlink written 6.4 years ago by Devon Ryan98k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 985 users visited in the last hour