Question

Differential analysis show different results with edgeR and in box plot with t-test

1

Entering edit mode

5.4 years ago

newbie ▴ 130

Hi,

I have a dataset with 159 tumors and 113 normal samples. I did differential analysis using edgeR and selected differential expressed genes based on Foldchange > 2 and FDR < 0.05 (Tumors vs Normal). From differentially expressed genes I selected upregulated genes based on positive logFC. Among the upregulated genes I could see a gene FAP which I'm interested in.

So, FAP is upregulated gene in Tumors when compared to Normal samples.

But when I plotted the expression (logCPM) of gene FAP between Tumors and Normal samples I see that p-value is significant but shows that expression is higher in Normal samples. Here is the box plot. enter image description here

Why this gene is upregulated in Tumors with edgeR and in the box plot it shows higher expression in Normals? Why so different in different analysis? Anything wrong?

P.S. I calculated logCPM after filtering out low expressed genes

RNA-Seq R differential analysis edger boxplot • 4.3k views

ADD COMMENT • link updated 3.8 years ago by harinisundareswaran • 0 • written 5.4 years ago by newbie ▴ 130

0

Entering edit mode

did u used quantile normalisation

ADD REPLY • link 3.8 years ago by harinisundareswaran • 0

0

Entering edit mode

Please use ADD REPLY not the answer field.

ADD REPLY • link 3.8 years ago by ATpoint 84k

score 1 · Answer 1 · 2019-04-18

1

Entering edit mode

5.4 years ago

Benn 8.3k

A few comments, edgeR is not using logCPM for testing but counts. These counts are kind of normalized within the model, so it is a bit behind the screen what really happens there (model based normalization). Second comment, you state that normal is higher in your boxplot, but you refer to the median values right? Did you also calculate the means?

ADD COMMENT • link 5.4 years ago by Benn 8.3k

0

Entering edit mode

Yes, edgeR is not using logCPM. And yes referring to the median values I say that expression is higher in Normals compared to tumors. Is this not the right way to say higher or lower? No I didn't calculate the mean.

ADD REPLY • link 5.4 years ago by newbie ▴ 130

2

Entering edit mode

Judging from your boxplot, I think mean values would be higher in tumor vs normal, that's my point.

ADD REPLY • link 5.4 years ago by Benn 8.3k

0

Entering edit mode

Oh yes I see the mean of Tumors is higher compared to Normals.

# A tibble: 2 x 4
  Type   count  mean    sd
  <chr>  <int> <dbl> <dbl>
1 Normals   113  3.08  1.26
2 Tumors    159  3.90  3.03

But in the box plots, usually the higher or lower is said based on median right? Or I'm wrong?

ADD REPLY • link 5.4 years ago by newbie ▴ 130

0

Entering edit mode

The black horizontal bar in the middle of the box is median. edgeR is not using boxplots for analysis.

ADD REPLY • link 5.4 years ago by Benn 8.3k

0

Entering edit mode

Yes ofcourse edgeR doesn’t use boxplots. But with a t-test when u make a boxplot and if the median is like in the above plot, do you consider mean or median to say which group is higher?