Question: Showing FDR and log2FC in Volcano plot using R
1
gravatar for Farbod
8 months ago by
Farbod3.1k
Toronto
Farbod3.1k wrote:

Dear Biostars, Hi

In order to bioinformaticly show DEGs of RNA-seq in a volcano plot, I intend to use R, showing DEGs with blue spots and not significantly expressed genes with red dots.

The edgeR package which is embedded in Trinity software, uses FDR (>0.001) and log2FC (>2) of the matrix file as the threshold.

I will show my R code and the head of my matrix file. My QUESTIONs are:

1- Which I should use, FDR or -log10(FDR) ? (e.g in aes(x=logFC, y =-log10(FDR),) (when I used FDR the volcano was funny and when I used -10log(FDR), it seems that the number of blue dots that are presenting the DEGs are more than number of DEGs reported by Trinity package.

2- which I should use; lof FC or log2FC?

NOTE1: The Code

library(ggplot2)
data <- read.table("trans_counts.matrix.A_vs_B.edgeR.DE_results", header=TRUE)

##Identify the genes that have a FDR < 0.001
data$threshold = as.factor(data$FDR < 0.001)

##Construct the plot object
g <- ggplot(data=data, 
            aes(x=logFC, y =-log10(FDR), 
                colour=threshold)) +
  geom_point(alpha=0.4, size=1.75) +
  xlim(c(-14, 14)) +
  xlab("log2 fold change") + ylab("-log10 FDR") +
  theme_bw() +
  theme(legend.position="none")
g

NOTE2: Head of file

> external_gene_name    logFC   logCPM  P.Value FDR
> 
> TRINITY_DN179827_c0_g2_i4 -13.8732718193368   4.61855606743036    3.03811154383156e-30    1.87897780840196e-24
> 
> TRINITY_DN194528_c5_g3_i4 13.2697070220115    4.01618042373378    1.21936893870205e-27    3.7707094407506e-22
> 
> TRINITY_DN192883_c9_g2_i2 11.2269211011449    3.82178000481344    3.20153046019011e-26    6.60015780727773e-21
volcano rna-seq R • 1.5k views
ADD COMMENTlink modified 8 months ago by cpad01128.2k • written 8 months ago by Farbod3.1k

I think your logFC values are is already log2. In general, volcano plots are logFC vs P-value (-logged (10), in most cases). There are manuscripts that used FDR in volcano plots. Almost similar post here: Volcano plot p-value or FDR?

ADD REPLYlink written 8 months ago by cpad01128.2k

Thanks, In "aes(x=logFC, y =-log10(FDR)" I should use FDR or -log10(FDR) ?

ADD REPLYlink written 8 months ago by Farbod3.1k
1

It makes sense (to me) to use -log10 (FDR) instead of FDR. Talk to your PI or client or the intended journal, once you are done with it and iterate it as per the feedback. Please label the axis accordingly.

ADD REPLYlink modified 8 months ago • written 8 months ago by cpad01128.2k
1

Yes, I have seen both being used, i.e., -log10(P) and -log10(FDR). Either is legitimate.

ADD REPLYlink written 8 months ago by Kevin Blighe25k

For question 1, I think the problem is you set the threshold on the original FDR values (data$FDR < 0.001), but then apply it to transformed values (y =-log10(FDR))

ADD REPLYlink written 8 months ago by h.mon18k

Yes,

In original research I used FDR = 0.001 and log2FC =2 and I want to draw same volcano.

but

when I set the threshold as "data$threshold = as.factor(data$-log10(FDR) < 0.001) " it says :"Error in eval(expr, envir, enclos) : object 'threshold' not found"

when I use "data$threshold = as.factor(data$P.Value < 0.001)" and " aes(x=logFC, y =-log10(FDR), ", the volcano is appeared.

you can also see that in Trinity, volcano is according to -log10FDR.

What do you think? what should I do ?

ADD REPLYlink modified 8 months ago • written 8 months ago by Farbod3.1k
data$threshold = as.factor(-log10(data$FDR) < 3)

Try in these lines.

ADD REPLYlink written 8 months ago by cpad01128.2k

Hi, I tried it,

The blue spots change to red and red dot changed to blue!

ADD REPLYlink written 8 months ago by Farbod3.1k
1

my bad. It should be >, not <.

data$threshold = as.factor(data$FDR < 0.001)
data$threshold = as.factor(-log10(data$FDR) > 3)

they must be same. Looking back, this should not make any difference. Your original code should still be good as threshold (T/F) is on a list of genes.

ADD REPLYlink modified 8 months ago • written 8 months ago by cpad01128.2k

Hi @cpad0112, I have another question and SORRY if it is not a bioinformatic one,

Is there any way to change these red and blue colours of dots (e.g into red and green)?

as I could not guess which part of my R script is resbomsible for that by looking at it's line of command.

Thanks

ADD REPLYlink written 8 months ago by Farbod3.1k
1

Try adding following line ( as a separate line)

scale_color_manual(values = c("green", "red")) +
ADD REPLYlink written 8 months ago by cpad01128.2k

hi cpad0112,

I would also like to draw a volcano map showing my diff expressed transcripts. But first, I want to filter my data from edgeRglm. After running edgreglm, i got logfc, logcpm, pval, & adjfdr val. I would like to log2 the logfc and logcpm values. How to do that? Or, is my logfc and logcpm already in log2 state?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by bioinfool0

I am not sure about your data output. But in general, most of them output log transformed vales for fold change calculation. Please note that volcano plot is between lfc (fold changes) and p/adj p values.

ADD REPLYlink written 5 weeks ago by cpad01128.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1142 users visited in the last hour