Question: Differential analysis show different results with edgeR and in box plot with t-test
0
gravatar for newbie
5 weeks ago by
newbie30
newbie30 wrote:

Hi,

I have a dataset with 159 tumors and 113 normal samples. I did differential analysis using edgeR and selected differential expressed genes based on Foldchange > 2 and FDR < 0.05 (Tumors vs Normal). From differentially expressed genes I selected upregulated genes based on positive logFC. Among the upregulated genes I could see a gene FAP which I'm interested in.

So, FAP is upregulated gene in Tumors when compared to Normal samples.

But when I plotted the expression (logCPM) of gene FAP between Tumors and Normal samples I see that p-value is significant but shows that expression is higher in Normal samples. Here is the box plot.enter image description here

Why this gene is upregulated in Tumors with edgeR and in the box plot it shows higher expression in Normals? Why so different in different analysis? Anything wrong?

P.S. I calculated logCPM after filtering out low expressed genes

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by newbie30
1
gravatar for Benn
5 weeks ago by
Benn6.8k
Netherlands
Benn6.8k wrote:

A few comments, edgeR is not using logCPM for testing but counts. These counts are kind of normalized within the model, so it is a bit behind the screen what really happens there (model based normalization). Second comment, you state that normal is higher in your boxplot, but you refer to the median values right? Did you also calculate the means?

ADD COMMENTlink written 5 weeks ago by Benn6.8k

Yes, edgeR is not using logCPM. And yes referring to the median values I say that expression is higher in Normals compared to tumors. Is this not the right way to say higher or lower? No I didn't calculate the mean.

ADD REPLYlink written 5 weeks ago by newbie30
2

Judging from your boxplot, I think mean values would be higher in tumor vs normal, that's my point.

ADD REPLYlink written 5 weeks ago by Benn6.8k

Oh yes I see the mean of Tumors is higher compared to Normals.

# A tibble: 2 x 4
  Type   count  mean    sd
  <chr>  <int> <dbl> <dbl>
1 Normals   113  3.08  1.26
2 Tumors    159  3.90  3.03

But in the box plots, usually the higher or lower is said based on median right? Or I'm wrong?

ADD REPLYlink written 5 weeks ago by newbie30

The black horizontal bar in the middle of the box is median. edgeR is not using boxplots for analysis.

ADD REPLYlink written 5 weeks ago by Benn6.8k

Yes ofcourse edgeR doesn’t use boxplots. But with a t-test when u make a boxplot and if the median is like in the above plot, do you consider mean or median to say which group is higher?

ADD REPLYlink written 5 weeks ago by newbie30
2

t-test uses mean values, see here.

ADD REPLYlink written 5 weeks ago by Benn6.8k

You should check the data for normality to see if t-tests are appropriate. The tumor sample looks suspiciously right-skewed and not very normally-distributed. A Wilcoxon Rank Sum test should be more appropriate here and probably adequate enough given the large sample size.

ADD REPLYlink written 5 weeks ago by ATpoint16k
1

There is no reason to use a t-test on logCPM data, edgeR approach is much better suited for RNA-seq data. I assumed OP was using a t-test to double check if it gave more or less similar results as edgeR.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Benn6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 905 users visited in the last hour