Question: Volcano plot help code
4
2.4 years ago by
anasofiamoreira9470 wrote:

Good Morning, I'm sorry to disturb, but as anyone done these volcano plot? I'm only able to do the traditional one, i'm kind knew too these field. Thanks Ana Sofia Moreira

rna-seq R • 11k views
modified 2.3 years ago by jordi.planells230 • written 2.4 years ago by anasofiamoreira9470
1

You can check ggplot2 documentation, ggplot2 tutorial, ggrepel. These are enough to plot what you have asked for.

1
1

Hi Kevin, I am trying to use your code with my data but I am not being able to produce any plot (eventhough I don't get any error messages back). So I wonder wether you could help me to localize my problem. The code I am using is the following

``````results <- read.table("DEseq2 output.txt", sep = "\t", header = T, as.is = T)
toptable <- as.data.frame(results)

pdf("Volcanoplot_DoubleKD_gene_type.pdf",width = 10.15, height = 8.10)
library(ggplot2)

EnhancedVolcanoDESeq2 <- function(toptable, AdjustedCutoff=0.05, LabellingCutoff=0.05, FCCutoff=2.0, main="DoubleKD VolcanoPlot")
{
toptable\$Significance <- "NS"
toptable\$Significance[(abs(toptable\$log2FoldChange) > FCCutoff)] <- "FC"
table(toptable\$Significance)
toptable\$Significance <- factor(toptable\$Significance, levels=c("NS", "FC", "FDR", "FC_FDR"))

plot <- ggplot(toptable, aes(x=log2FoldChange, y=-log10(padj))) +

#   Colour based on factors set a few lines up
#   Set size of points
geom_point(aes(color=factor(Significance)), alpha=1/2, size=0.8) +

#Choose which colours to use; otherwise, ggplot2 choose automatically (order depends on how factors are ordered in toptable\$Significance)
scale_color_manual(values=c(NS="grey30", FC="forestgreen", FDR="royalblue", FC_FDR="red2"), labels=c(NS="NS", FC=paste("LogFC>|", FCCutoff, "|", sep=""), FDR=paste("FDR Q<", AdjustedCutoff, sep=""), FC_FDR=paste("FDR Q<", AdjustedCutoff, " & LogFC>|", FCCutoff, "|", sep=""))) +

#Set the size of the plotting window
theme_bw(base_size=24) +

#Modify various aspects of the plot text and legend
theme(legend.background=element_rect(),
plot.title=element_text(angle=0, size=12, face="bold", vjust=1),

panel.grid.major=element_blank(), #Remove gridlines
panel.grid.minor=element_blank(), #Remove gridlines

axis.text.x=element_text(angle=0, size=12, vjust=1),
axis.text.y=element_text(angle=0, size=12, vjust=1),
axis.title=element_text(size=12),

#Legend
legend.position="top",            #Moves the legend to the top of the plot
legend.key=element_blank(),       #removes the border
legend.key.size=unit(0.5, "cm"),  #Sets overall area/size of the legend
legend.text=element_text(size=8), #Text size
title=element_text(size=8),       #Title text size
legend.title=element_blank()) +       #Remove the title

#Change the size of the icons/symbols in the legend
guides(colour = guide_legend(override.aes=list(size=2.5))) +

#Set x- and y-axes labels
xlab(bquote(~Log[2]~ "fold change")) +

#Set the axis limits
#xlim(-6.5, 6.5) +
#ylim(0, 100) +

#Set title
ggtitle(main) +

#Tidy the text labels for a subset of genes
size=2.25,
#segment.color="black", #This and the next parameter spread out the labels and join them to their points by a line
#segment.size=0.01,
check_overlap=TRUE,
vjust=1.0) +

#Add a vertical line for fold change cut-offs
geom_vline(xintercept=c(-FCCutoff, FCCutoff), linetype="longdash", colour="black", size=0.4) +

#Add a horizontal line for P-value cut-off

return(plot)
}
``````

The DEseq2 output has the regular structure, being: Contig baseMean log2FoldChange lfcSE stat pvalue padj NReads FBgn0000003 135701.1304 3.841593895 0.218300714 17.59771566 2.56E-69 3.72E-66 513393 etc.

Jordi Planells

1

Hi Jordi,

First ensure that all devices are switched off by running `dev.off()` until there are no more devices to close.

Then, you should try to run the function like this:

``````library(ggplot2)
EnhancedVolcanoDESeq2 <- function(toptable, AdjustedCutoff=0.05, LabellingCutoff=0.05, FCCutoff=2.0, main="DoubleKD VolcanoPlot")
{
toptable\$Significance <- "NS"
toptable\$Significance[(abs(toptable\$log2FoldChange) > FCCutoff)] <- "FC"
table(toptable\$Significance)
toptable\$Significance <- factor(toptable\$Significance, levels=c("NS", "FC", "FDR", "FC_FDR"))

plot <- ggplot(toptable, aes(x=log2FoldChange, y=-log10(padj))) +

#   Colour based on factors set a few lines up
#   Set size of points
geom_point(aes(color=factor(Significance)), alpha=1/2, size=0.8) +

#Choose which colours to use; otherwise, ggplot2 choose automatically (order depends on how factors are ordered in toptable\$Significance)
scale_color_manual(values=c(NS="grey30", FC="forestgreen", FDR="royalblue", FC_FDR="red2"), labels=c(NS="NS", FC=paste("LogFC>|", FCCutoff, "|", sep=""), FDR=paste("FDR Q<", AdjustedCutoff, sep=""), FC_FDR=paste("FDR Q<", AdjustedCutoff, " & LogFC>|", FCCutoff, "|", sep=""))) +

#Set the size of the plotting window
theme_bw(base_size=24) +

#Modify various aspects of the plot text and legend
theme(legend.background=element_rect(),
plot.title=element_text(angle=0, size=12, face="bold", vjust=1),

panel.grid.major=element_blank(), #Remove gridlines
panel.grid.minor=element_blank(), #Remove gridlines

axis.text.x=element_text(angle=0, size=12, vjust=1),
axis.text.y=element_text(angle=0, size=12, vjust=1),
axis.title=element_text(size=12),

#Legend
legend.position="top",            #Moves the legend to the top of the plot
legend.key=element_blank(),       #removes the border
legend.key.size=unit(0.5, "cm"),  #Sets overall area/size of the legend
legend.text=element_text(size=8), #Text size
title=element_text(size=8),       #Title text size
legend.title=element_blank()) +       #Remove the title

#Change the size of the icons/symbols in the legend
guides(colour = guide_legend(override.aes=list(size=2.5))) +

#Set x- and y-axes labels
xlab(bquote(~Log[2]~ "fold change")) +

#Set the axis limits
#xlim(-6.5, 6.5) +
#ylim(0, 100) +

#Set title
ggtitle(main) +

#Tidy the text labels for a subset of genes
size=2.25,
#segment.color="black", #This and the next parameter spread out the labels and join them to their points by a line
#segment.size=0.01,
check_overlap=TRUE,
vjust=1.0) +

#Add a vertical line for fold change cut-offs
geom_vline(xintercept=c(-FCCutoff, FCCutoff), linetype="longdash", colour="black", size=0.4) +

#Add a horizontal line for P-value cut-off

return(plot)
}

results <- read.table("DEseq2 output.txt", sep = "\t", header = T, as.is = T)
toptable <- as.data.frame(results)

pdf("Volcanoplot_DoubleKD_gene_type.pdf",width = 10.15, height = 8.10)
EnhancedVolcanoDESeq2(toptable, AdjustedCutoff=0.05, LabellingCutoff=0.05, FCCutoff=2.0, main="DoubleKD VolcanoPlot")
dev.off()
``````

Does that work?

Sorry for late reply I've been out for some days. I am afraid it doesn't work, I have used dev.off() untill I got this message back: Error in dev.off() : cannot shut down device 1 (the null device). Still I don't get any output. I am not able to figure out why

Can you show the exact sequence of commands that you're using? - sorry

Of course!

``````toptable <- read.table("topTable_DESeq2.tabular", sep = "\t", header = T, as.is = T)
attach(toptable)

library(ggplot2)
EnhancedVolcanoDESeq2 <- function(toptable, AdjustedCutoff=0.05, LabellingCutoff=0.05, FCCutoff=2.0, main="VolcanoPlot")
``````

and the plot commands exactly the same code you posted in github

``````{
toptable\$Significance <- "NS"
toptable\$Significance[(abs(toptable\$log2FoldChange) > FCCutoff)] <- "FC"
table(toptable\$Significance)
toptable\$Significance <- factor(toptable\$Significance, levels=c("NS", "FC", "FDR", "FC_FDR"))

plot <- ggplot(toptable, aes(x=log2FoldChange, y=-log10(padj))) +

#   Colour based on factors set a few lines up
#   Set size of points
geom_point(aes(color=factor(Significance)), alpha=1/2, size=0.8) +

#Choose which colours to use; otherwise, ggplot2 choose automatically (order depends on how factors are ordered in toptable\$Significance)
scale_color_manual(values=c(NS="grey30", FC="forestgreen", FDR="royalblue", FC_FDR="red2"), labels=c(NS="NS", FC=paste("LogFC>|", FCCutoff, "|", sep=""), FDR=paste("FDR Q<", AdjustedCutoff, sep=""), FC_FDR=paste("FDR Q<", AdjustedCutoff, " & LogFC>|", FCCutoff, "|", sep=""))) +

#Set the size of the plotting window
theme_bw(base_size=24) +

#Modify various aspects of the plot text and legend
theme(legend.background=element_rect(),
plot.title=element_text(angle=0, size=12, face="bold", vjust=1),

panel.grid.major=element_blank(), #Remove gridlines
panel.grid.minor=element_blank(), #Remove gridlines

axis.text.x=element_text(angle=0, size=12, vjust=1),
axis.text.y=element_text(angle=0, size=12, vjust=1),
axis.title=element_text(size=12),

#Legend
legend.position="top",            #Moves the legend to the top of the plot
legend.key=element_blank(),       #removes the border
legend.key.size=unit(0.5, "cm"),  #Sets overall area/size of the legend
legend.text=element_text(size=8), #Text size
title=element_text(size=8),       #Title text size
legend.title=element_blank()) +       #Remove the title

#Change the size of the icons/symbols in the legend
guides(colour = guide_legend(override.aes=list(size=2.5))) +

#Set x- and y-axes labels
xlab(bquote(~Log[2]~ "fold change")) +

#Set the axis limits
#xlim(-6.5, 6.5) +
#ylim(0, 100) +

#Set title
ggtitle(main) +

#Tidy the text labels for a subset of genes
size=2.25,
#segment.color="black", #This and the next parameter spread out the labels and join them to their points by a line
#segment.size=0.01,
check_overlap=TRUE,
vjust=1.0) +

#Add a vertical line for fold change cut-offs
geom_vline(xintercept=c(-FCCutoff, FCCutoff), linetype="longdash", colour="black", size=0.4) +

#Add a horizontal line for P-value cut-off

return(plot)
}
``````

Thank you so much again

13
2.4 years ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

Edit 28th March, 2019:

this is now a Bioconductor package: EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling

## --------------------

That looks like an up-side down volcano plot, which is unusual. Why not just do a regular one?

You can use my EnhancedVolcano function on my GitHub page (in part developed with a Junior Bioinformatician): https://github.com/kevinblighe (https://github.com/kevinblighe/EnhancedVolcano)

Apart from anything else, it makes the use of `ggrepel` in order to fit as many gene labels as possible.

1

PS - if you prefer to have the lines connecting the gene names to the points, then un-comment the following lines in my function:

``````#segment.color="black", #This and the next parameter spread out the labels and join them to their points by a line
#segment.size=0.01
``````

Hi @ Kevin Blighe,

Is there any way to solve "package ‘EnhancedVolcano’ is not available (for R version 3.3.3 RC)" error in Rstudio?

2

Ah, apologies, it is not on Bioconductor. You can just copy the R script via git clone or just download it, and then load it into your session with:

``````source("EnhancedVolcano.R")
``````

I have not yet standardised the code such that it would be ready for Bioconductor. Hope that's okay.

1

Error in abs(toptable\$log2FoldChange) : non-numeric argument to mathematical function i m trying to use this but i get this error ..sorry my bad i had some NA in data frame , now removed but i still dont get the plot...finally managed to get the plot "RStudioGD" something with the graphics output after clearing dev.off() many time somehow it worked

hi @Kevin is it possible to do or label only few of the selected genes like , i see that most of them don;t show every thing but selected ones, i tried making a vector to label but i couldn't do it..it would be really helpful if you can show

1

Yes, the function works as follows:

• you can label all genes with or without overlaps
• you can label all genes without overlaps

The key is the `check_overlap=TRUE/FALSE` parameter that is passed to `geom_text()` within the function

If you just want to supply a specific list of gene names, like myCustomLabel=c("GeneA", "", "GeneC", "", "", "GeneF", ...), then you should be able to pass this to the function and use it again in the geom_text() function:

``````geom_text(data=subset(toptable, padj<LabellingCutoff & abs(log2FoldChange)>FCCutoff),
aes(label=myCustomLabel),
size=2.25,
#segment.color="black", #This and the next parameter spread out the labels and join them to their points by a line
#segment.size=0.01,
check_overlap=TRUE,
vjust=1.0) +
``````

How do solve the non-numeric argument? I've removed NA and convert he E-number but still having the same error. Can you share with me the solution? Thanks

Can you paste the commands that you use before you attempt to generate the volcano plot?

Also, could you run the `str()` command on you data and paste the output?

Thanks!

Hi Kevin,

Apology for late response. I manage to solve the problem. My file is in csv format so there is an error when import the file using read.table command. So I just reformat the file into tab deliminated format, reran back the "EnhancedVolcano.R" script and manage to generate cool volcano plot. However the gene IDs were not based on ENSEMBLE ID. DId I do something wrong ?

1

Hey, that's great! The gene names will be whatever you have used, so, they could be ENSEMBL IDs, RefSeq IDs, etc.

Note that I have updated this function and it is currently being submitted to Bioconductor: https://github.com/kevinblighe/EnhancedVolcano

It is now much more flexible. You can install the first version like this:

``````require(devtools)

install_github("kevinblighe/EnhancedVolcano")

require(EnhancedVolcano)

?EnhancedVolcano
``````

Hi Kevin

I've tested the https://github.com/kevinblighe/EnhancedVolcano. however the volcano plot generated without any point. I'm using the same dataset for previous run. Here are the command I used

``````results2<-read.table("res.infected_norm-vs-control-noNA.txt", header=TRUE)
EnhancedVolcano(results2,
lab = rownames(results2),
x = "log2FoldChange",
y = "pvalue",
pCutoff = 0.05,
FCcutoff = 2,
transcriptPointSize = 1.5,
transcriptLabSize = 3.0,
title= "infected vs uninfected")
``````

Weird... what is the output of

``````head(results2)
str(results2)
``````

here you are

``````> head(results2)
baseMean log2FoldChange    lfcSE      stat   pvalue
ENSMUSG00000033860 1168.9376      -27.24585 4.784735 -5.694328 1.24e-08
ENSMUSG00000057400  993.8006      -26.97533 4.784741 -5.637783 1.72e-08
ENSMUSG00000033831 1201.9378      -26.87136 4.784734 -5.616061 1.95e-08
ENSMUSG00000078520  551.5458      -26.32777 4.784771 -5.502410 3.75e-08
ENSMUSG00000059481  686.4623      -26.24442 4.784757 -5.485006 4.13e-08
ENSMUSG00000113555  807.0622      -25.97442 4.457547 -5.827064 5.64e-09
ENSMUSG00000033860 2.19e-07
ENSMUSG00000057400 2.96e-07
ENSMUSG00000033831 3.32e-07
ENSMUSG00000078520 6.09e-07
ENSMUSG00000059481 6.65e-07
ENSMUSG00000113555 1.06e-07

> str(results2)
'data.frame':   33044 obs. of  6 variables:
\$ baseMean      : num  1169 994 1202 552 686 ...
\$ log2FoldChange: num  -27.2 -27 -26.9 -26.3 -26.2 ...
\$ lfcSE         : num  4.78 4.78 4.78 4.78 4.78 ...
\$ stat          : num  -5.69 -5.64 -5.62 -5.5 -5.49 ...
\$ pvalue        : num  1.24e-08 1.72e-08 1.95e-08 3.75e-08 4.13e-08 5.64e-09 6.14e-08 8.28e-08 8.95e-08 1.03e-07 ...
\$ padj          : num  2.19e-07 2.96e-07 3.32e-07 6.09e-07 6.65e-07 1.06e-07 9.57e-07 1.26e-06 1.35e-06 1.54e-06 ...
``````

Thanks! Nothing seems out of the ordinary. I was able to plot the data-points you pasted.

Have you got the ggrepel package installed? - it's required, but wasn't required in the 'pre-development' version that I posted on Biostars.

I can see that I need to improve the code to produce helpful error messages. It's still passing through the Bioconductor submission process.

If worse comes to worse, just continue to use the version that actually worked for you. Thanks!

1

yup, I've ggrepel installed.

the volcano plot generated using the script is good enough. Thank you very much, appreciate your help!!

1

I think that i'v solved this issue in the most recent version (0.99.4) on GitHub. I encountered the problem myself recently. Thanks for the input!

1

I've rerun my dataset using the recent version. Yup now I'm able to generate the plot. Thanks Kevin

Hi Kevin, This seems like some great package, however I am running into the same "empty plot" error as above. I am pasting the str(data) results of 'data'(i.e. my input file). Could you please let me know,is the file format is wrong? My code for the plot is same as above:

``````EnhancedVolcano(data,
lab = rownames(data),
x = "log2FoldChange",
y = "qvalue",
pCutoff = 0.05,
FCcutoff = 2,
transcriptPointSize = 1.5,
transcriptLabSize = 3.0,
title= "Treated vs untreated")
``````

Edit: I don't know how to upload image, so copy pasting the str(data) results.( Sorry)

``````'data.frame':   58676 obs. of  14 variables:
\$ GeneId                            : Factor w/ 58676 levels "TBIG000001","TBIG000002",..: 641 4468 9003 9153 9274 9365 10275 11524 14429 15694 ...
\$ GeneAcc                           : Factor w/ 58676 levels "ENSG00000000003",..: 16655 5426 4353 36091 13318 6834 10698 622 13091 4184 ...
\$ NcbiGene                          : Factor w/ 26853 levels "1","10","100",..: 17517 22283 20484 17872 9266 25141 14238 7687 10038 17173 ...
\$ Type                              : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
\$ GeneName                          : Factor w/ 57131 levels "1-Dec","1-Mar",..: 41663 21692 22354 54930 13393 20863 30970 23741 27077 41135 ...
\$ Desc                              : Factor w/ 45340 levels "","-","1-acylglycerol-3-phosphate O-acyltransferase 1 [Source:HGNC Symbol;Acc:HGNC:324]",..: 26809 5467 2846 41883 914 2306 13008 4060 9260 26723 ...
\$ EXP.Control.FPKM                  : Factor w/ 5482 levels "-","0","0.01",..: 3796 3686 5247 599 3716 2286 1010 4941 5169 452 ...
\$ EXP.Case.FPKM                     : Factor w/ 5667 levels "-","0","0.01",..: 1691 4570 1429 553 4117 3494 3340 265 2342 1686 ...
\$ DEG.Control.vs.Case.QV.0.05.SELECT: Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
\$ DEG.Control.vs.Case.PV            : Factor w/ 6633 levels "-","0.0001","0.0002",..: 2 2 2 2 2 2 2 2 2 2 ...
\$ qvalue                            : num  0.0287 0.0154 0.0154 0.0154 0.0154 0.0154 0.0154 0.0154 0.0154 0.0154 ...
\$ log2FoldChange                    : num  -4.54 -2.79 -2.47 3.1 -3.04 -2.63 -2.03 -2.95 -1.83 -2.6 ...
\$ DEG.Control.vs.Case.UP_DOWN       : Factor w/ 3 levels "-","DOWN","UP": 2 2 2 3 2 2 2 2 2 2 ...
\$ GeneOntology                      : Factor w/ 16446 levels "-","GO:0000002,GO:0000122,GO:0000165,GO:0000790,GO:0000977,GO:0000978,GO:0000981,GO:0001077,GO:0001085,GO:0001205,G"| __truncated__,..: 11435 8571 8697 7835 10594 955 8730 11181 8371 8192 ...
``````

Hey, are the rownames even set for your `data` object? Maybe try

``````lab = as.character(data\$GeneId)
``````

Yes, the row names are set. same thing even after trying lab= as.character(data\$GeneId). I observed that lot of my rows are NA's for foldchange and qvalue. do you think this is affecting the results?

hi @Kevin Blighe, anything on this please?

Hey, when you say "a lot" [of your rows are NA], you equally imply that not everything is NA (?). Perhaps, therefore, the problem lies elsewhere outside of `EnhancedVolcano()`.

What does the following return:

``````table( is.na(as.character(data\$GeneId)) )

table( is.na(data\$GeneId) )
``````

``````table( is.na(as.character(data\$GeneId)) ) returns < table of extent 0 >
table( is.na(data\$GeneId) ) returns < table of extent 0 >
``````

I wonder what could the problem outside enhanced volcano be. My input is a csv file with the columns; GeneName, log2FoldChange, qvalue,Control Vs Case UP_Down

column "Control Vs Case UP_Down" just tells me whether a gene is up regulated or downregulated.

``````table( as.character(data\$GeneId) == 'NA' )
``````

how did you produce the `data` object? It is strange that virtually all columns are factors.

`table( as.character(data\$GeneId) == 'NA' )` also gives me < table of extent 0 > I was given the csv file with above listed columns, so am not very sure which procedure was used to arrive at the results in csv file. Can we not convert factors into a type that is required by EnhancedVolcano function(i.e. if factors are creating a problem)?

Yes, but `as.character()` should convert the factors into characters. You have never shown your resulting plot; so, I can neither confirm that anything is even erroneous here. How do you read the data into R?

I read using read.csv. I am not sure how to upload the image. could you please tell me? or i can send it to you else where if that is okay with you.

kevin, this tool is very good in providing beautiful figure. but your script use airway dataset, right? but i got confused with the series of replies. can you redirect me to the newer ver so i can follow it using my data?

Hey, assuming that you have any differential expression analysis results table with p-value and fold changes, then you can use the package EnhancedVolcano. What have you produced so far? Can you post an example?

thanks for the quick reply. i am fairly new to this so i had difficulty understanding how things work here. anyway, yes, i already have my DE data, which i already filtered. it consist of my_transcript_name, log2FC, & FDR value. since my data is already filtered, i just wanted to plot it using volcano plot. i started with the basic command from your script because i just want to draw out of something with my data. however, i couldn't arrive with a figure.

toptable <- as.data.frame(results)

install.packages("ggplot2")

library(ggplot2)

plot <- ggplot (toptable, aes(x=log2FC,y=FDR))

Oh, this code is now a comprehensive package on Bioconductor: https://github.com/kevinblighe/EnhancedVolcano

In EnhancedVolcano, for the most basic plot, you would do:

``````library(EnhancedVolcano)
EnhancedVolcano(toptable,
lab = rownames(toptable),
x = 'log2FC',
y = 'FDR')
``````

You may have to install the package first.

tried installing the package but it says

Warning in install.packages: package 'EnhancedVolcano' is not available (for R ver 3.5.3)

what to do?

Try to install not via `install.packages()`

``````if (!requireNamespace('BiocManager', quietly = TRUE))
install.packages('BiocManager')
BiocManager::install('EnhancedVolcano')
``````

In the unlikely event that that does not work, this will work:

``````devtools::install_github('kevinblighe/EnhancedVolcano')
``````

(you may need to install devtools first)