I ran edgeR and DESeq2 on two datasets each with 5 conditions. For each condition, I have at least three replicates. However, the results don't seem to match up between DESeq2 and edgeR. Only about 20-30% of the genes called by DESeq as DE are also in the set called by edgeR. DESeq2 also seems to call more genes than edgeR but there is no clear trend across conditions.
I tried running DESeq2 with the same initial filtering as in edgeR and with
independentFiltering=FALSE and the agreement percentages between the resulting sets of genes are not changed.
Do you know what are the possible causes of this difference in results?
I tried different sets of inputs counts (transcripts from different sources, genes etc).
What question are you asking? DESeq2 and edgeR are different software packages and do, indeed, often give different results. If you are concerned about your code, you'll need to post your code. If you are asking about the differences in results, I'm not sure that we will be able to help too much, as some difference is expected.
Some difference is indeed to be expected but a negative correlation between logFC changes and only a 20% match in results seems a bit over the top. I am looking for potential causes. The code is rather long and does not use anything new. I simply copy pasted the typical commands from the vignettes of the packages and set an FDR of 0.01.
20% match in results is, in my experience, not that unusual, particularly since you are using a low FDR (1%). The negative correlation in logFC is simply a switch in the direction when calculating contrasts. Just compare abs(logFC) to clear up that problem. Finally, the estimates of logFC are not simple ratios in either DESeq2 or edgeR, so there will be some differences between the two packages.
How many genes are called fo each?
In each case, DESeq2 calls about 2000 genes and edgeR about 1000-1400.
I checked the correlation between the logFC in DESeq2 and edgeR and it is very bad in every case. For one condition, it even correlates negatively. I am not sure why.
I've done a lot of this in the past and the logFCs should look very, very similar: basically a straight line along the x=y diagonal with perhaps some outliers. If not there's an issue with the raw counts and/or how the normalisation was done.
A negative correlation is probably due to the comparison being reversed in one of the two tools.
I would break this down by looking at the simple case first, only one comparison, and see if you get better agreement. If you do build it up from there.
Thank you for the suggestion. It helped a lot to choose only one condition and check every step. The "normalised" counts correlate very well but logFC still correlates poorly between edgeR and DESeq2 for each condition.
Can you show the count and logFC correlations?
What kind of conditions are you using? 2000 genes seems a bit much. If you take a cutoff value for the adjusted p-value, do you get a more reasonable common set? Do you see different logFC values on the common set you previously had?
I applied a cutoff for the adjusted pvalue of 0.01. The different logFC values are for all the genes not just the differentially expressed ones. I ran the typical:
and the typical:
My design is 6 replicates for condition A, 6 for condition B and 3 for condition C. I am interested in the contrasts between condition C and A and between B and A. After applying the contrasts, I took the logFC values of the common set of genes. They correlated negatively for C-A and poorly for B-A.
DeSeq2: Can you post the results for your 0.01 FDR of
summary(res)? And a MA plot? I am still surprised by 2000 DE genes. Look up the DeSeq2 vignette (3.9 Tests of log2 fold change above or below a threshold). If too many genes do indeed change between your conditions, it might affect the normalization (everybody, please correct me if I am wrong), and DeSeq2/edgeR might be imprecise.
If you take only the 500 most changed genes with each method, do you see a better agreement between methods?
Here is a summary of the results with DESeq2:
and an MA plot of the same object:
I you compare with the DeSeq2 MA plot in the manual, you see you have a lot more variations (just looking at the 1-2 range, not even the outliers). It seems the really lowly expressed genes were corrected, but you still have too many changing genes which is not good for DeSeq2 or EdgeR. They assume most genes have an unchanged expression. Do you really expect massive changes between your conditions?
I am a bit unsure about how to proceed from here. You might only consider the 500 genes with the highest adjusted p-value, maybe.