Differential analysis show different results with edgeR and in box plot with t-test
1
0
Entering edit mode
2.6 years ago
newbie ▴ 100

Hi,

I have a dataset with 159 tumors and 113 normal samples. I did differential analysis using edgeR and selected differential expressed genes based on Foldchange > 2 and FDR < 0.05 (Tumors vs Normal). From differentially expressed genes I selected upregulated genes based on positive logFC. Among the upregulated genes I could see a gene FAP which I'm interested in.

So, FAP is upregulated gene in Tumors when compared to Normal samples.

But when I plotted the expression (logCPM) of gene FAP between Tumors and Normal samples I see that p-value is significant but shows that expression is higher in Normal samples. Here is the box plot.

Why this gene is upregulated in Tumors with edgeR and in the box plot it shows higher expression in Normals? Why so different in different analysis? Anything wrong?

P.S. I calculated logCPM after filtering out low expressed genes

RNA-Seq R differential analysis edger boxplot • 2.1k views
0
Entering edit mode

did u used quantile normalisation

0
Entering edit mode

Please use ADD REPLY not the answer field.

1
Entering edit mode
2.6 years ago
Benn 8.2k

A few comments, edgeR is not using logCPM for testing but counts. These counts are kind of normalized within the model, so it is a bit behind the screen what really happens there (model based normalization). Second comment, you state that normal is higher in your boxplot, but you refer to the median values right? Did you also calculate the means?

0
Entering edit mode

Yes, edgeR is not using logCPM. And yes referring to the median values I say that expression is higher in Normals compared to tumors. Is this not the right way to say higher or lower? No I didn't calculate the mean.

2
Entering edit mode

Judging from your boxplot, I think mean values would be higher in tumor vs normal, that's my point.

0
Entering edit mode

Oh yes I see the mean of Tumors is higher compared to Normals.

# A tibble: 2 x 4
Type   count  mean    sd
<chr>  <int> <dbl> <dbl>
1 Normals   113  3.08  1.26
2 Tumors    159  3.90  3.03


But in the box plots, usually the higher or lower is said based on median right? Or I'm wrong?

0
Entering edit mode

The black horizontal bar in the middle of the box is median. edgeR is not using boxplots for analysis.

0
Entering edit mode

Yes ofcourse edgeR doesn’t use boxplots. But with a t-test when u make a boxplot and if the median is like in the above plot, do you consider mean or median to say which group is higher?

3
Entering edit mode

t-test uses mean values, see here.