Interpreting the output of edgeR table coefficients
1
1
Entering edit mode
3.4 years ago
calin.timbus ▴ 10

Hello everybody,

I am a beginner in R&Bioinformatics and I am following a tutorial based on the book of Dan Maclean, R Bioinformatics Cookbook.

I have a question with regard to the interpretation of the results on some differential expression.

In essence, I am performing differential expression on a subset of Drosophila Melanogaster and I would like an explanation in natural terms on the results obtained.

Practically I select a subset of of larvae and I perform Differential Expression and I get those results:

grouping <- droplevels(phenoData(modencodefly.eset)[['stage']][columns_of_interest])

print(columns_of_interest) print(grouping)

counts_of_interest <- exprs(modencodefly.eset)[, columns_of_interest]

eset_dge <- edgeR::DGEList( counts = counts_of_interest, group = grouping )

design <- model.matrix(~ grouping)

eset_dge <-edgeR::estimateDisp(eset_dge, design)

fit <- edgeR::glmQLFit(eset_dge, design)

result <- edgeR::glmQLFTest(fit, coef=2)

topTags(result)

The table below is the result obtained after D.E.

               logFC    logCPM        F       PValue          FDR
FBgn0027527 6.318665 11.148756 42854.72 1.132951e-41 1.684584e-37
FBgn0037424 6.417770  9.715826 33791.15 2.152507e-40 1.518091e-36
FBgn0037430 6.557774  9.109132 32483.00 3.510727e-40 1.518091e-36
FBgn0037414 6.337846 10.704514 32088.92 4.083908e-40 1.518091e-36
FBgn0029807 6.334590  9.008720 27648.19 2.585312e-39 7.688200e-36
FBgn0037224 7.055635  9.195077 24593.62 1.102456e-38 2.732070e-35

Can someone explain to me the first 3 columns?

Thank you

R genome sequencing • 3.2k views
ADD COMMENT
2
Entering edit mode

See this thread

ADD REPLY
4
Entering edit mode
3.4 years ago
ATpoint 82k

logFC is the fold change on the log scale. One uses the log to make changes symmetric around zero, example:

> 70/30; 30/70; log2(70/30); log2(30/70)
[1] 2.333333
[1] 0.4285714
[1] 1.222392
[1] -1.222392

Without the log FCs smaller one would be compressed between zero and 0.9999999, while positive ones go from one to infinity. The log compensates for this.

logCPM is the average expression of all samples for that particular gene across all samples on the log-scale expressed in counts per million (cpm, as calculated by edgeR after normalization). It is kind of a reflection of the base level of the gene in the sample population you are testing, generally longer or more highly-expressed genes have higher logCPM and vice versa. Generally, statistical power rises the higher this value is.

F is the F-statistic from the quasi-likelihood F-test.

PValue is the nominal p-value derived from F without any multiple testing correction and FDR is the PValue after (by default) Benjamini-Hochberg correction.

ADD COMMENT

Login before adding your answer.

Traffic: 2545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6