Question: edgeR understanding statistics
gravatar for vm.higareda
9 months ago by
vm.higareda20 wrote:

I am confused about normalization and statistics behind DE programs, I am using edgeR to analize two condittions.

Example for a gene ( raw-counts) four replicates by condition. Control (C) and treatment (T) of a gene:

gene= FBgn0034710

Controles = 820-1618-1728-1007

Tratamientos= 7195-1252-1312-1291

Result of edgeR

logFC =1.10 logCPM = 6.5 LR = 9.77 PValue = 0.0017 FDR= 0.02

Why FBgn0034710 gene is statistically significant if one replicate has a lot of raw counts (7915) in comparation with the others. I know that library size could be a factor but this is similar in the other replicates

ADD COMMENTlink modified 9 months ago by Kevin Blighe24k • written 9 months ago by vm.higareda20

Try taking out such outliers within a group and rerun the statistical test. I do not think edgeR has any mechanism to prune such data. One should filter out such discrepancies at expression level within group and across groups and then feed the data to edgeR.

ADD REPLYlink written 9 months ago by cpad01127.7k
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe24k
USA / Europe / Brazil
Kevin Blighe24k wrote:


It's significant because the difference in means will be great due to that single outlier. However, you should note that the log fold change (logFC) is just 1.10... Therefore, I would not consider this gene at all for downstream analyses. Usually we use a combination of both FDR Q value (i.e. FDR-adjusted P values) and logFC for filtering genes for statistical significance.

Hope that this helps.

ADD COMMENTlink written 9 months ago by Kevin Blighe24k

Yes it was useful, thank you for your answer. I am still confused why this kind of programs do not take in account outlier replicates

ADD REPLYlink written 9 months ago by vm.higareda20

You could try DESeq2, which does deal with outliers. I have not used edgeR.

cpad's suggestion (above) to remove outliers is valid only if the sample is a genuinely problematic sample whose values are not related to the biological condition being studied.

ADD REPLYlink written 9 months ago by Kevin Blighe24k

edgeR's problem with outliers is an age old record ( and some of the people shifted to DESeq2 for the same reason ( ), valid or not. Few suggestions were to filter out outliers either programmatically (median) or manually. An addition to edgeR is discussed in this paper to handle outliers. I guess trying with DEseq2 should throw some light on this issue.

ADD REPLYlink modified 9 months ago • written 9 months ago by cpad01127.7k

Thank you for your response, the old post and paper are really interesting. This is an old problem. I don't not understand why edgeR developments do not solved this problem.

ADD REPLYlink written 9 months ago by vm.higareda20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 567 users visited in the last hour