Question

edgeR understanding statistics

1

Entering edit mode

6.6 years ago

vm.higareda ▴ 30

I am confused about normalization and statistics behind DE programs, I am using edgeR to analize two condittions.

Example for a gene ( raw-counts) four replicates by condition. Control (C) and treatment (T) of a gene:

gene= FBgn0034710

Controles = 820-1618-1728-1007

Tratamientos= 7195-1252-1312-1291

Result of edgeR

logFC =1.10 logCPM = 6.5 LR = 9.77 PValue = 0.0017 FDR= 0.02

Why FBgn0034710 gene is statistically significant if one replicate has a lot of raw counts (7915) in comparation with the others. I know that library size could be a factor but this is similar in the other replicates

RNA-Seq edgeR statistics transcriptomics • 3.5k views

ADD COMMENT • link updated 6.6 years ago by Kevin Blighe 87k • written 6.6 years ago by vm.higareda ▴ 30

1

Entering edit mode

Try taking out such outliers within a group and rerun the statistical test. I do not think edgeR has any mechanism to prune such data. One should filter out such discrepancies at expression level within group and across groups and then feed the data to edgeR.

ADD REPLY • link 6.6 years ago by cpad0112 21k

score 0 · Answer 1 · 2017-10-11

0

Entering edit mode

6.6 years ago

Kevin Blighe 87k

Hey,

It's significant because the difference in means will be great due to that single outlier. However, you should note that the log fold change (logFC) is just 1.10... Therefore, I would not consider this gene at all for downstream analyses. Usually we use a combination of both FDR Q value (i.e. FDR-adjusted P values) and logFC for filtering genes for statistical significance.

Hope that this helps.

ADD COMMENT • link 6.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes it was useful, thank you for your answer. I am still confused why this kind of programs do not take in account outlier replicates

ADD REPLY • link 6.6 years ago by vm.higareda ▴ 30

0

Entering edit mode

You could try DESeq2, which does deal with outliers. I have not used edgeR.

cpad's suggestion (above) to remove outliers is valid only if the sample is a genuinely problematic sample whose values are not related to the biological condition being studied.

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

1

Entering edit mode

edgeR's problem with outliers is an age old record (https://support.bioconductor.org/p/45417/) and some of the people shifted to DESeq2 for the same reason (https://support.bioconductor.org/p/89526/ ), valid or not. Few suggestions were to filter out outliers either programmatically (median) or manually. An addition to edgeR is discussed in this paper to handle outliers. I guess trying with DEseq2 should throw some light on this issue.

ADD REPLY • link 6.6 years ago by cpad0112 21k

0

Entering edit mode

Thank you for your response, the old post and paper are really interesting. This is an old problem. I don't not understand why edgeR developments do not solved this problem.

ADD REPLY • link 6.6 years ago by vm.higareda ▴ 30