Question: How is logFC and p-value estimated (for volcano plots)?
gravatar for LRStar
2.8 years ago by
United States
LRStar190 wrote:

I am trying to understand how volcano plots are made in DEG studies.

Say I have an example data frame that begins with the following first three rows:

> head(myDF)
              ID   C_S1_R1   C_S1_R2   C_S1_R3  C_S2_R1  C_S2_R2  C_S2_R3 
1 Glyma0002s50.1 4.9191791 5.2691071 4.8815068 5.172160 4.869919 5.435079
2 Glyma0003s50.1 6.4923718 7.3523922 7.1775645 5.984183 5.552393 5.920057
3 Glyma0005s50.1 0.8024444 1.8061819 2.4128941 3.508926 2.429191 2.024571

One pipeline a bioinformatician might follow is as follows:

d <- DGEList(counts = myDF[,2:7], group = c(rep("S1", 3), rep("S2", 3)), genes = myDF[,1])
d <- calcNormFactors(d)
d <- estimateCommonDisp(d)
d <- estimateTagwiseDisp(d)
d <- estimateTrendedDisp(d)
de <- exactTest(d, pair=c("S1", "S2"), dispersion = "tagwise")
deDF <-

At this point, I believe the data frame is ready to be constructed into a volcano plot - because it contains a column for the logFC and a column for the p-values:

> head(deDF)
  table.logFC table.logCPM table.PValue comparison          genes
1  0.06705792     4.730629    1.0000000         S1 Glyma0002s50.1
2 -0.23335090     4.977254    0.7492599         S2 Glyma0003s50.1
3  0.65611207     3.962589    0.5810553         S1 Glyma0005s50.1

I could then run a command like:

qplot(data=deDF, table.logFC, table.PValue)

to create the volcano plot, where each dot in the plot represents a gene and its logFC and p-value.

I have 3 main questions about this type of approach:

1) Why does the comparison column in deDF structure switch between S1 and S2 in alternating rows? I would have thought the information in this column was the result of both S1 and S2 comparisons for each row anyway. 2) How is the table.logFC column calculated? I tried to replicate the table.logFC values in deDF object from the read counts of the same rows of the myDF object, but cannot figure it out. Any advise would be appreciated! 3) How is the table.PValue column calculated? I imagine it would be a difference between the means of the S1 and S2 groups?

If anyone has any ideas toward any of these 3 questions, I would be happy/grateful to hear them!

edger foldchange R • 2.3k views
ADD COMMENTlink modified 22 months ago by jha.barkha810 • written 2.8 years ago by LRStar190

And have you read the edgeR paper?

ADD REPLYlink written 2.8 years ago by WouterDeCoster42k

Did you use the raw data or a normalised version of your data?

ADD REPLYlink written 2.5 years ago by Spacebio180

I used a normalized version of my data. Thank you.

ADD REPLYlink written 2.5 years ago by LRStar190

what is the need for padjacent value in volcano plot

ADD REPLYlink written 22 months ago by jha.barkha810

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

This question is off topic and would be more appropriate in a new thread, but please spend some more time thinking about it and write a slightly longer question.

ADD REPLYlink written 22 months ago by WouterDeCoster42k
gravatar for Carlo Yague
2.8 years ago by
Carlo Yague4.8k
Carlo Yague4.8k wrote:

As WDC implied, the answers to your three questions is in the paper and documentation of edgeR. You should read these thoroughly. However your interpretation of the p-value is completely wrong here :

I imagine it would be a difference between the means of the S1 and S2 groups?

If the fold-change reflects the variability (the ratio, actually) between the means of the groups, the p-value reflects the significance of that variability in regards to the variability within the groups. You should at least understand that before you even start thinking about building a volcano plot.

ADD COMMENTlink written 2.8 years ago by Carlo Yague4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1082 users visited in the last hour