Question

Interpreting the output of edgeR table coefficients

1

Entering edit mode

3.4 years ago

calin.timbus ▴ 10

Hello everybody,

I am a beginner in R&Bioinformatics and I am following a tutorial based on the book of Dan Maclean, R Bioinformatics Cookbook.

I have a question with regard to the interpretation of the results on some differential expression.

In essence, I am performing differential expression on a subset of Drosophila Melanogaster and I would like an explanation in natural terms on the results obtained.

Practically I select a subset of of larvae and I perform Differential Expression and I get those results:

grouping <- droplevels(phenoData(modencodefly.eset)[['stage']][columns_of_interest])

print(columns_of_interest) print(grouping)

counts_of_interest <- exprs(modencodefly.eset)[, columns_of_interest]

eset_dge <- edgeR::DGEList( counts = counts_of_interest, group = grouping )

design <- model.matrix(~ grouping)

eset_dge <-edgeR::estimateDisp(eset_dge, design)

fit <- edgeR::glmQLFit(eset_dge, design)

result <- edgeR::glmQLFTest(fit, coef=2)

topTags(result)

The table below is the result obtained after D.E.

               logFC    logCPM        F       PValue          FDR
FBgn0027527 6.318665 11.148756 42854.72 1.132951e-41 1.684584e-37
FBgn0037424 6.417770  9.715826 33791.15 2.152507e-40 1.518091e-36
FBgn0037430 6.557774  9.109132 32483.00 3.510727e-40 1.518091e-36
FBgn0037414 6.337846 10.704514 32088.92 4.083908e-40 1.518091e-36
FBgn0029807 6.334590  9.008720 27648.19 2.585312e-39 7.688200e-36
FBgn0037224 7.055635  9.195077 24593.62 1.102456e-38 2.732070e-35

Can someone explain to me the first 3 columns?

Thank you

R genome sequencing • 3.2k views

ADD COMMENT • link updated 3.4 years ago by ATpoint 82k • written 3.4 years ago by calin.timbus ▴ 10

2

Entering edit mode

See this thread

ADD REPLY • link 3.4 years ago by brunobsouzaa ▴ 830

score 4 · Accepted Answer · 2020-12-31

logFC is the fold change on the log scale. One uses the log to make changes symmetric around zero, example:

> 70/30; 30/70; log2(70/30); log2(30/70)
[1] 2.333333
[1] 0.4285714
[1] 1.222392
[1] -1.222392

Without the log FCs smaller one would be compressed between zero and 0.9999999, while positive ones go from one to infinity. The log compensates for this.

logCPM is the average expression of all samples for that particular gene across all samples on the log-scale expressed in counts per million (cpm, as calculated by edgeR after normalization). It is kind of a reflection of the base level of the gene in the sample population you are testing, generally longer or more highly-expressed genes have higher logCPM and vice versa. Generally, statistical power rises the higher this value is.

F is the F-statistic from the quasi-likelihood F-test.

PValue is the nominal p-value derived from F without any multiple testing correction and FDR is the PValue after (by default) Benjamini-Hochberg correction.