Question: How is logFC and p-value estimated (for volcano plots)?
1
23 months ago by
LRStar10
United States
LRStar10 wrote:

I am trying to understand how volcano plots are made in DEG studies.

Say I have an example data frame that begins with the following first three rows:

``````> head(myDF)
ID   C_S1_R1   C_S1_R2   C_S1_R3  C_S2_R1  C_S2_R2  C_S2_R3
1 Glyma0002s50.1 4.9191791 5.2691071 4.8815068 5.172160 4.869919 5.435079
2 Glyma0003s50.1 6.4923718 7.3523922 7.1775645 5.984183 5.552393 5.920057
3 Glyma0005s50.1 0.8024444 1.8061819 2.4128941 3.508926 2.429191 2.024571
``````

One pipeline a bioinformatician might follow is as follows:

``````d <- DGEList(counts = myDF[,2:7], group = c(rep("S1", 3), rep("S2", 3)), genes = myDF[,1])
d <- calcNormFactors(d)
d <- estimateCommonDisp(d)
d <- estimateTagwiseDisp(d)
d <- estimateTrendedDisp(d)
de <- exactTest(d, pair=c("S1", "S2"), dispersion = "tagwise")
deDF <- as.data.frame(de)
``````

At this point, I believe the data frame is ready to be constructed into a volcano plot - because it contains a column for the logFC and a column for the p-values:

``````> head(deDF)
table.logFC table.logCPM table.PValue comparison          genes
1  0.06705792     4.730629    1.0000000         S1 Glyma0002s50.1
2 -0.23335090     4.977254    0.7492599         S2 Glyma0003s50.1
3  0.65611207     3.962589    0.5810553         S1 Glyma0005s50.1
``````

I could then run a command like:

``````qplot(data=deDF, table.logFC, table.PValue)
``````

to create the volcano plot, where each dot in the plot represents a gene and its logFC and p-value.

1) Why does the comparison column in deDF structure switch between S1 and S2 in alternating rows? I would have thought the information in this column was the result of both S1 and S2 comparisons for each row anyway. 2) How is the table.logFC column calculated? I tried to replicate the table.logFC values in deDF object from the read counts of the same rows of the myDF object, but cannot figure it out. Any advise would be appreciated! 3) How is the table.PValue column calculated? I imagine it would be a difference between the means of the S1 and S2 groups?

If anyone has any ideas toward any of these 3 questions, I would be happy/grateful to hear them!

edger foldchange R • 1.6k views
modified 12 months ago by jha.barkha810 • written 23 months ago by LRStar10
1

And have you read the edgeR paper?

Did you use the raw data or a normalised version of your data?

I used a normalized version of my data. Thank you.

what is the need for padjacent value in volcano plot

Please use `ADD COMMENT` or `ADD REPLY` to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

This question is off topic and would be more appropriate in a new thread, but please spend some more time thinking about it and write a slightly longer question.

4
23 months ago by
Carlo Yague4.4k
Belgium
Carlo Yague4.4k wrote:

As WDC implied, the answers to your three questions is in the paper and documentation of edgeR. You should read these thoroughly. However your interpretation of the p-value is completely wrong here :

I imagine it would be a difference between the means of the S1 and S2 groups?

If the fold-change reflects the variability (the ratio, actually) between the means of the groups, the p-value reflects the significance of that variability in regards to the variability within the groups. You should at least understand that before you even start thinking about building a volcano plot.