Question

Selecting range of log fold change values for representing differentially expressed genes in RNA-Seq experiment

1

Entering edit mode

9.8 years ago

menorca89 ▴ 20

I am carrying out a RNA Seq study with four time points each of two knockout strains, having NO replicates. Since I cannot consider the p-value/q-value in this case. I thought of looking at the log fold change values obtained form Cuffdiff.I want to represent the most differentially expressed genes across the four time points of a strain as a heatmap, but am unsure of how to select the same. Should I sort out the values from pair wise comparisons and select the top 10 and bottom 10? Or should I choose a range of values, such as -2 to +2 ? I was confused because the sample values corresponding to positive or negative INF are extremely high, and I thought those genes should be represented for sure.

If anyone has any suggestions about representing DE Genes across time points or between the strains (T1 Knockout1 vs T1 Knockout2, etc), I would be grateful for the same.

Many thanks in advance!

Cummerbund Heatmap log-fold RNA-Seq Cuffdiff • 7.7k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by menorca89 ▴ 20

Ram · Answer 1 · 2014-07-10

1

Entering edit mode

9.8 years ago

Devon Ryan 104k

For looking purely at difference due to the knock-out you do have replicates, so you can just use a likelihood ratio test and get some useable p-values. For the comparisons at individual time-points, I would recommend not bothering. Your study isn't designed to look for differences at time-points, so you'd largely just be wasting your time.

ADD COMMENT • link 9.8 years ago by Devon Ryan 104k

0

Entering edit mode

I am going to chime in to doubly agree with Devon. Using fold changes is not a good substitute for calculating a p-value with replicates because some genes bounce around a lot in environmental conditions (3-4x) for no apparent reason while others would be expected to have low variance, where a 1.5X fold change may be biologically important. You don't want to spend time on data if you have no way of knowing what you are looking at.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Michele Busby ★ 2.2k

Ram · Answer 2 · 2014-07-10

I would check the absolute expression values as well because low expression genes can have deceivingly high fold-change values (since something is infinitely more than nothing).

I tend to use log2(RPKM + 0.1) expression values, but I think you could add a rounding factor between 0.01 and 1. If you want to get an idea about the effect if different rounding values, you can see this paper.

score 1 · Answer 3 · 2014-07-14

Thankyou for your reply, Charles. I do realize that having no replicates is a problem and that the experiment should have been designed differently. But, with the present data, I am now thinking of first removing those entries for which Cuffdiff could not find much info and stated 'NO TEST' or 'FAIL'. Then, for those with FPKM between 0 and 0.05, if the corresponding value is above 10 (arbitrarily), I shall keep them, else discard. And finally, perhaps filter on the basis of log fold change values, and keep those more than +/- 1.5 or so.

I hope this will be better than simply going by the p-values in this case.

Ram · Answer 4 · 2014-07-14

Hello,

Thank you for your replies.

Running Cuffdiff on the double knockout and triple knockout strain (at the same time point), I have got a table with the fold change, p-value,q value,etc. What I observe is, that the genes with a log fold change as high as 11 or 10, are not deemed as significant, and mostly the genes with a value of 0 in one case, are categorized as 'significant'. The top 2 genes are actually knocked out in the first sample, though they still show slight expression, and I would have expected them to show up as being DE genes. So, I am still confused over the range of log fold change to be considered.The values in this case range from -11 to 11. Some of the entries can be seen below. Thank you in advance.

value_1     value_2     log2(fold_change)     test_stat     p_value     q_value      significant
0.091093    262.705     11.4938               0.124772      0.01725     0.369422     no
4.21505     5174        10.2615               8.37264       0.00405     0.219178     no
2.13768     139.416     6.02721               0.756259      0.1381      0.683679     no
0           4.69E+06    inf                   #NAME?        5.00E-05    0.017711     yes
0           1.91E+06    inf                   #NAME?        5.00E-05    0.017711     yes


1657.79     0.602447    -11.4261              -20.6374      0.0652      0.543262     no
215.63      0.172036    -10.2916              -10.4816      0.14555     0.691207     no