Question: edgeR understanding statistics
0
gravatar for vm.higareda
9 weeks ago by
vm.higareda0 wrote:

I am confused about normalization and statistics behind DE programs, I am using edgeR to analize two condittions.

Example for a gene ( raw-counts) four replicates by condition. Control (C) and treatment (T) of a gene:

gene= FBgn0034710

Controles = 820-1618-1728-1007

Tratamientos= 7195-1252-1312-1291

Result of edgeR

logFC =1.10 logCPM = 6.5 LR = 9.77 PValue = 0.0017 FDR= 0.02

Why FBgn0034710 gene is statistically significant if one replicate has a lot of raw counts (7915) in comparation with the others. I know that library size could be a factor but this is similar in the other replicates

ADD COMMENTlink modified 9 weeks ago by Kevin Blighe9.3k • written 9 weeks ago by vm.higareda0
1

Try taking out such outliers within a group and rerun the statistical test. I do not think edgeR has any mechanism to prune such data. One should filter out such discrepancies at expression level within group and across groups and then feed the data to edgeR.

ADD REPLYlink written 9 weeks ago by cpad01123.8k
0
gravatar for Kevin Blighe
9 weeks ago by
Kevin Blighe9.3k
London
Kevin Blighe9.3k wrote:

Hey,

It's significant because the difference in means will be great due to that single outlier. However, you should note that the log fold change (logFC) is just 1.10... Therefore, I would not consider this gene at all for downstream analyses. Usually we use a combination of both FDR Q value (i.e. FDR-adjusted P values) and logFC for filtering genes for statistical significance.

Hope that this helps.

ADD COMMENTlink written 9 weeks ago by Kevin Blighe9.3k

Yes it was useful, thank you for your answer. I am still confused why this kind of programs do not take in account outlier replicates

ADD REPLYlink written 9 weeks ago by vm.higareda0

You could try DESeq2, which does deal with outliers. I have not used edgeR.

cpad's suggestion (above) to remove outliers is valid only if the sample is a genuinely problematic sample whose values are not related to the biological condition being studied.

ADD REPLYlink written 9 weeks ago by Kevin Blighe9.3k
1

edgeR's problem with outliers is an age old record (https://support.bioconductor.org/p/45417/) and some of the people shifted to DESeq2 for the same reason (https://support.bioconductor.org/p/89526/ ), valid or not. Few suggestions were to filter out outliers either programmatically (median) or manually. An addition to edgeR is discussed in this paper to handle outliers. I guess trying with DEseq2 should throw some light on this issue.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by cpad01123.8k

Thank you for your response, the old post and paper are really interesting. This is an old problem. I don't not understand why edgeR developments do not solved this problem.

ADD REPLYlink written 9 weeks ago by vm.higareda0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1370 users visited in the last hour