Calculate fold change in edgeR with one sample per condition
2
1
Entering edit mode
4 months ago

Hi,

We have run a pilot RNA-Seq study with one sample per condition, this is just a test run. I understand there is no valid statistical test in this case, however just curious to obtain differential expression through edgeR package in R assuming dispersion = 0.4 for the human data. I have a normal (baseline) sample followed by 5 different samples. When I run the edgeR package, I want to indicate my normal sample (as baseline), however, I am unsure what sample here is taken as a baseline for calculation of fold change.

FC calculation

FC = Normal/Test_1 (OR) 
FC = Test_1/Normal

Samples

Normal (baseline) = Normal
Test (Treated) = Test_1

Data

dput(df_data)
structure(list(Normal = c(0L, 184L, 60L, 0L, 7L, 0L, 87L, 0L, 
0L, 21L, 193L, 29L, 0L, 0L, 3L, 50L), Test_1 = c(0, 140.5, 64, 
0, 4, 0, 83, 0, 1, 51.5, 199, 25, 0, 0, 5, 62)), class = "data.frame", row.names = c("Gene1", 
"Gene2", "Gene3", "Gene4", "Gene5", "Gene6", "Gene7", "Gene8", 
"Gene9", "Gene10", "Gene11", "Gene12", "Gene13", "Gene14", "Gene15", 
"Gene16"))


dput(df_metadata)
structure(list(SampleID = c("xxxx1", "xxxx2"), CoreLabID = c("Normal", 
"Test_1")), class = "data.frame", row.names = c("Normal", "Test_1"
))

Here is the code that I am running

bcv <- 0.4
y <- DGEList(counts=df_data, group=df_metadata$CoreLabID)
et <- exactTest(y, dispersion=bcv^2)
View(et$table)
structure(list(logFC = c(0, -0.67280976110796, -0.190706878123648, 
0, -1.06592239047733, 0), logCPM = c(0.456451013758882, 6.84518828528986, 
5.46338499556895, 0.456451013758882, 2.37389406911164, 0.456451013758882
), PValue = c(1, 0.433579402199822, 0.851984371429117, 1, 0.542580328250669, 
1)), row.names = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5", 
"Gene6"), class = "data.frame")

View(et$comparison)
c("Test_1", "Normal")

Thank you,

Toufiq

R edgeR expression rna-seq differential • 459 views
ADD COMMENT
2
Entering edit mode
8 weeks ago
Gordon Smyth ★ 3.5k

edgeR has compared Test_1 to Normal so, yes, the normal sample has been taken as the baseline. et$comparison records the numerator and denominator for the fold-change respectively.

You can find this out for yourself by reading the help page for exactTest, help(exactTest). See the documentation for the pair argument.

If you use the standard edgeR pipeline

topTags(et)

then the results will be presented in a easy to read form and the output will tell you which sample has been compared to which.

ADD COMMENT
0
Entering edit mode

Gordon Smyth

Thank you very much for the quick response and suggestions. This is very helpful.

ADD REPLY
1
Entering edit mode
4 months ago

How about this work-flow:

dge <- DGEList(counts)
dge <- calcNormFactors(dge)
my.cpm <- cpm(dge, log = F)
my.cpm.log <- cpm (dge, log = T, prior.count = 2)
allFC <- my.cpm[,1] / my.cpm[,2]
allLogFC <- my.cpm.log[,1] - my.cpm.log[,2]
ADD COMMENT
0
Entering edit mode

@Michael Dondrup, thank you. This looks good!

Along with the fold change, I would also like to output the p-values (as additional column). In my question using edgeR, how does the fold change gets calculated in this package?

bcv <- 0.4
y <- DGEList(counts=df_data, group=df_metadata$CoreLabID)
et <- exactTest(y, dispersion=bcv^2)

View(et$table)
structure(list(logFC = c(0, -0.67280976110796, -0.190706878123648, 
0, -1.06592239047733, 0), logCPM = c(0.456451013758882, 6.84518828528986, 
5.46338499556895, 0.456451013758882, 2.37389406911164, 0.456451013758882
), PValue = c(1, 0.433579402199822, 0.851984371429117, 1, 0.542580328250669, 
1)), row.names = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5", 
"Gene6"), class = "data.frame")

View(et$comparison)
c("Test_1", "Normal")
ADD REPLY
0
Entering edit mode

This is not the same as the edgeR logFC calculation.

ADD REPLY

Login before adding your answer.

Traffic: 2224 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6