The results (logFC) are not consistence (edgeR packages)
1
0
Entering edit mode
3.9 years ago
solarchan7 • 0

I'm trying to find the differences between the test and control groups (genes). But the results are not consistence. For example, when there's a decrease in the test group, the fold change is positive! This happens in about 5/200 of them. Here's my codes, I don't know what's wrong... (because of the signed contact, I can't post any data here, sorry!) :

The data (df) is like:

gene Test1 Test2 Test3 Control1 Control2 Control3 Control4

A ...

B ...

C ...
...
n = 200

groups = factor(c(rep("test",3), rep("control",4)))
dge = DGEList(as.matrix(df), group = groups) 
dge = calcNormFactors(dge, method = "TMM") 
design_matrix = model.matrix(~0+groups, data = dge$samples) 
colnames(design_matrix) = c("test", "control") 
dge = estimateDisp(dge, design_matrix, robust=TRUE) 
FIT = glmQLFit(dge, design_matrix, robust=TRUE)
model = glmQLFTest(FIT, contrast = makeContrasts(contrasts = c("test-control"), levels = design_matrix)) 
output = data.frame(Gene = rownames(dge$counts), FoldChange = 2^-(model$table$logFC))

thanks in advance!

R RNA-Seq • 2.7k views
ADD COMMENT
0
Entering edit mode

Unclear what the problem is. Do you run the analysis on only 200 genes? What do you mean by when there's a decrease in the test group?

ADD REPLY
0
Entering edit mode

Thanks for the reply. For example, the data is like: (https://ibb.co/8ckvMdx) example (not real) data

where the test group values are less than the control groups, but the fold change is 2! but the following line, the test group values are also less than the control groups but the fold change is ~0.5 which make way more sense.

ADD REPLY
0
Entering edit mode

It looks like you are assuming the rows are in the same order in dge and in model$table. Are you sure this is the case?

ADD REPLY
0
Entering edit mode

ummmm..... not really. How can I check if they match up?

ADD REPLY
0
Entering edit mode

They are in the same order.

ADD REPLY
3
Entering edit mode
3.9 years ago
Gordon Smyth ★ 7.0k

edgeR logFCs are always consistent with the counts and the data, so the title of your question can't be correct.

You can't judge expression changes just by looking at the matrix of counts, because the different library sizes for each sample need to be taken into account as well. To examine the data for each gene adjusting for library sizes, the simplest way is to compute the logCPM values:

logCPM <- cpm(dge, log=TRUE)

The logFC reported by edgeR are closely related to (but not the same as) the difference in mean logCPM between groups.

ADD COMMENT
0
Entering edit mode

I'm a bit confuse :( . So if I want to get the fold change, I do:

logCPM = cpm(dge, log=TRUE) FC = 2^logCPM

? And this will fix the problem of different sign ?

ADD REPLY
0
Entering edit mode

No, that's not what I mean. If you unlog logCPM you will just get CPM, not logFC.

I don't understand why you're not simply using topTags(model) which shows you the differentially expressed genes and their logFCs. I don't follow what else you are trying to do or what problem you see.

Your code FoldChange = 2^-(model$table$logFC)) is converting logFCs to inverse-fold-changes. I'm not sure why you would want to do that, but fold-changes and inverse-fold-changes are of course always positive.

ADD REPLY
0
Entering edit mode

I am trying to compare the test group (test1, test2, and test3) and the control group (control1, control2, control3, and control4). So yes, it is possible to get what I want with topTags(model); however, the problem is: (for example) using topTags(model), I got gene A and gene B having logFC -1.5 and 0.9 respectively, however, they SHOULD have the same sign according to their original data where both genes have a significant larger control groups than the test group.

ADD REPLY
0
Entering edit mode

I'm not convinced that there is any problem. How have you judged that the genes SHOULD have the same logFC? None of the code you show in your question would allow you to make that judgement. What "original data" have you looked at and how?

ADD REPLY
0
Entering edit mode
  • I DON'T know how to solve the problem, but I think I can clarify some points here. I think the OP assumes logFC = log(A/B) = log(test group/control group). So if the test group value is larger than the control group value, then it should result in log(some value > 1). Here, log(1) = 0. So if the test group value is larger than the control group value, then it SHOULD have POSITIVE value. However, OP seems to have both positive and negative values for genes that have test value > control value.
ADD REPLY
0
Entering edit mode

solarchan7, if you believe there is something 'awry', then please paste a complete reproducible example, including code to generate sample input data and code to process such data such that the problem, as you interpret it, can be observed by all in this thread. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6