Tutorial:Creating Interactive Volcano Plots with R and Plot.ly
0
41
Entering edit mode
5.6 years ago
steve ★ 3.1k

Experienced Bioinformaticians are probably familiar with the standard technique for creating volcano plots in R. Now there is a fun and interactive alternative available using the Plot.ly library. This code sample will demonstrate how to use this library to create an interactive plot.

To preview the full code + interactive plot, see my post here:
https://stevekm.github.io/2016/09/24/Plot.ly-Volcano-Plot.html

I also have the same page listed in a Gist (minus the interactive plot output) here:

R plotly volcano plot Tutorial • 36k views
2
Entering edit mode

I have the same problem the colnames(diff_df) looks like normal to me:

"external_gene_name" "Fold"               "FDR"                "group"


head(diff_df)
external_gene_name     Fold      FDR                  group
1             CXCL12 3.471136 1.00e-20 Significant&FoldChange
3              KCNK5 2.953598 7.44e-14 Significant&FoldChange
4             MAMLD1 2.786176 6.65e-16 Significant&FoldChange
5              KCNQ5 2.777046 9.25e-14 Significant&FoldChange
6              RBBP8 2.758790 4.84e-45 Significant&FoldChange

0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

0
Entering edit mode

The same problem here and the colnames(diff_df) looks the same as above in my case. I am still getting object not found error. Any suggestions?

0
Entering edit mode

I think you should ask a new question and add a link to this question from your new question. That would get you better results as the visibility would be better.

0
Entering edit mode

what exactly is the problem? Can you update the post to include the exact code you are running & the error message?

1
Entering edit mode

Hi and thank you for sharing this.

I usually use Trinity package that will draw Volcano plot for its DEG analysis (because I am new in R).

I will show you the related script and would you please kindly check them and see that is there any of the Trinity files that I can import them in R for drawing a volcano plot same as what you have provided?

  $TRINITY_HOME/Analysis/DifferentialExpression/run_DE_analysis.pl \ --matrix counts.matrix --method DESeq2 --samples_file samples_described.txt  The output from running the DE analysis will reside in the output directory you specified, and if not, a default directory name that includes the name of the method used (ie. DESeq2/ ). In this output directory, you'll find the following files for each of the pairwise comparisons performed: 1-${prefix}.sampleA_vs_sampleB.${method}.DE_results : the DE analysis results,\ including log fold change and statistical significance (see FDR column). 2-${prefix}.sampleA_vs_sampleB.${method}.MA_n_Volcano.pdf : MA and Volcano plots \ features found DE at FDR <0.05 will be colored red. Plots are shown\ with large (top) or small (bottom) points only for choice of aesthetics. 3-${prefix}.sampleA_vs_sampleB.${method}.Rscript : the R-script executed to perform the DE analysis. ADD REPLY 1 Entering edit mode The files you are referencing are located on your computer and not accessible in this way through the Internet. However if you check in my post I have shown the data format needed here: # preview the dataset; data required for the plot head(diff_df) ## external_gene_name Fold FDR ## 1 RP11-431K24.1 -4.13 1.04e-05 ## 2 UBE4B 2.42 8.71e-06 ## 3 UBIAD1 4.27 5.50e-06 ## 4 UBIAD1 1.89 2.16e-04 ## 5 UBIAD1 2.74 8.59e-05 ## 6 UBIAD1 3.42 1.39e-08  What this means is that the code I posted should work with any data that is held in a text based format (.txt, .csv, .tsv file, for example) with the following 3 columns: gene, fold change, significance (p value, etc.). If you have data in a file like this, you can try to load it into R using a function such as read.delim() as I used here (my preferred function for this), or one of the other options listed here. Once you have the data loaded in R, all you need to do is select the columns with the relevant information and pass them to the different aspects of the code shown and it should work. ADD REPLY 1 Entering edit mode Dear steve, Hi and Thank you The Trinity files has some more columns but I can make a new file only with the 3 columns you have kindly provided. by the way, instead of "gene" there would be "Trinity IDs" in that file, but I think I can change some of the most biologically important annotated trinity IDs with the genes that blast has showed. e.g. there are about 500k transcripts in the Trinity assembly and only 2000 of them are DEG and I just need to show 5 of them in each of my two condition in the volcano plot. What do you think about this approach? NOTE: I knew that the files in my computer are not reachable by you and I just mentioned those names because I guessed that may be you are familiar with Trinity output files. sorry and Thank you again. ADD REPLY 1 Entering edit mode You can use R to subset the data as needed (what I showed here), or you can even do it without subsetting the data at all by passing the references to the entries. The file you want is probably the DE_results file. I would try to open that file in something like Excel or TextEdit to verify its format so you can figure out how to load it into R. In particular, you will need to find out what the separator between columns is (the 'delimiter'); this might not be visible from Excel but should be visible from a raw editor such as TextEdit or Notepad or Sublime, and corresponds to the sep argument of read.delim() or read.table(). If it is comma separated, it would be sep=',', if it is tab separated it would be sep='\t' ADD REPLY 2 Entering edit mode Dear Steve, With your step-by-step description I could draw the volcano plot for my data and it was very awesome and interesting. Thank you. ADD REPLY 1 Entering edit mode Dear steve, Hi. Is there any way to show some genes other than the "five most DEG" on this volcano plot? Because some times the top 5 genes are the genes related to some stress reactions or even disease and the interesting biological DEGs situate at lover level of importance (e.g I have about 500 Upregulated DEGs for my "male" condition and the important genes locate at the level of 2,, 52, 145, 302 positins on the DEG-result.matrix) I hope I could describe the situation clearly. ADD REPLY 1 Entering edit mode yes, the way this example is set up, all you need to do is create a subset of the original dataframe that holds the genes you would like to label and use that in place of the top_peaks dataframe that I used here. The R code needed to obtain these entries will depend on your original data set. For example, if you don't mind hard-coding it and you know that you want entries 2, 52, 145, and 302, you could just do something like label_peaks <- diff_df[c(2, 52, 145, 302),] and pass this to the labeling list code. ADD REPLY 0 Entering edit mode Hi Steve, Do you have the Rscript for this showing the up-and-down regulated transcripts using volcano? I already have pre-filtered data. My significant data is FDR<1e-5. My upregulated is log2FC>0 while my down is log2FC<0. ADD REPLY 0 Entering edit mode Not sure I understand the question, since the original post documents all the R code used in this example. The transcript regulation values were produced using DiffBind using the ChIP-Seq workflow from this pipeline ADD REPLY 0 Entering edit mode This looks nice. I understand that you can't render the interactive plot directly here but could you upload it somewhere and link it in your post ? So we can get a better idea of how interactive the plot truly is. ADD REPLY 1 Entering edit mode I put the HTML output in my DropBox and added a link. I am looking into trying to put the plot up on their public site. EDIT: Link replaced by this page which includes the interactive version of the plot. ADD REPLY 0 Entering edit mode After logging on dropbox, I had access to the html file. Its cool ! Thanks. ADD REPLY 1 Entering edit mode oh yes I think you can just click the 'X' on the login window that Dropbox gives and still access the file. ADD REPLY 0 Entering edit mode I have my dropbox open but don't know how to get the link from your dropbox account. Just a simple quick question: does it mean that if I click on a dot on the plot, the gene name would show up and if I move the cursor elsewhere and the gene name disappears? That would be very cool. ADD REPLY 0 Entering edit mode Hi, Yes that's how it is ADD REPLY 0 Entering edit mode I took down the Dropbox link and instead hosted the plot online at the URL listed in the post. The gene name will appear as you mouse over the points. ADD REPLY 0 Entering edit mode That looks incredibly cool! I have something similar: d <- read.table("mydata.txt", header = TRUE); attach(d); d$Significant <- ifelse(FDR < 0.25, "FDR < 0.25", "Not Sig")

ggplot(d, aes(x = logFC, y = -log10(PValue))) +
geom_point(aes(color = Significant)) +
scale_color_manual(values = c("blue", "grey")) +
theme_bw(base_size = 16) +
geom_text_repel(
data = subset(d, (abs(logFC)>=4) & (FDR < 0.25)),
aes(label = GENE_ID),
size = 5,
)


The good part of it is that the gene names does not squeeze into each other and they look nice on the plot. The bad part is that it cannot do the interactive display. Suppose I just want to dynamically show the gene names when "mouse over", what should I do to change the code to do so?

Thanks much!

0
Entering edit mode

Dear Steve, Hi

• layout(annotations = a) Error in plot_ly(data = diff_df, x = Fold, y = -log10(FDR), text = external_gene_name, : object 'Fold' not found > p Error: ScalesList was built with an incompatible version of ggproto. Please reinstall the package that provides this extension.
1
Entering edit mode

Yes, that's an incompatibility error, we had the same with ggrepel and ggplot2, the best solution (so far, advice appreciated) is to crawl under your desk and cry.

0
Entering edit mode

Hi @WouterDeCoster,

But it was working well before !

Thanks for crawl and cry solution,

Is it any way to draw an non-interactive volcano plot with the same R script (maybe by removing some of its lines) ?

1
Entering edit mode

But it was working well before

Unfortunately I find myself in that situation often as well. Usually, some change in other installed system packages are to blame. For R, its frequently a change in the loaded version of gcc or zlib, though this may not be the case here.

The code posted in the OP here was designed specifically for Plotly, so you will not be able to use the same code verbatim for a base R plot, but since its essentially just a scatter plot it shouldn't need too many tweaks; my fallback would be the standard volcano plot code described here. If you want your plot to be available both in interactive and non-interactive forms from the same code, I would go with ggplot2 instead (see here), but of course that will not help if you are having ggplot2 compatibility issues.

0
Entering edit mode

Hi Stephen, this is amazingly helpful and is really helping with getting my head around R language for plotting.

I have a few ,what may be very simple questions. And feel free to point me towards other resources I should be reading.

1) How can I simply add a list of names that I specify from my gene name column, but maintain all the arrow information etc. I understand there must be a simple way of populating list a, with selected gene names. But I am too new to this to know how to do that.

2) is there a simple way of adding dashed lines to the graph where the cut-offs are.

3) is there a simple way to split the axis. I am thinking the y axis in my case, but am sure it would be similar for the x.

Thanks again,

It's been a great help already.

BW

1
Entering edit mode
1. Are you referring to the hard-coded black arrows on the plot? In this example, that is coming from the top_peaks data frame; you can substitute this for any df that has the gene name & values, just tweak the for loop shown to put the Fold Change, FDR, and Gene Name in the x, y, and text fields respectively in each entry in the list being built and pass it to the plot layout.

2 & 3. I honestly do not know off the top of my head, since writing this example I have moved almost all of my plotting to use ggplot2 instead. I find it to be advantageous because it makes much nicer 2-D plots, and can be imported directly into Plotly without modification. It looks like the Plotly docs have moved here, which might show you how to accomplish something like this using Plotly directly as in this example, or you could try building a similar plot in ggplot2 and import it to Plotly that way.

0
Entering edit mode

Great advice, just spent the morning honing my ggplot2 skills.

Works nicely, thanks

0
Entering edit mode

Hello Steve and thank you very much.

I am a very new user to R and I am following your script to create a volcano plot of some proteomic data that I have. All the commands are being accepted by the software but just at the very end, it pops up some errors. These are types of errors like the following one:

> # make the Plot.ly plot
> p <- plot_ly(data = diff_df, x = Fold, y = -log10(p.value), text = external_gene_name, mode = "markers", color = "group") %>%
+   layout(title ="DiffBind Volcano Plot")
Error in plot_ly(data = diff_df, x = Fold, y = -log10(p.value), text = external_gene_name,  :


So what I did was to modify it a little bit by adding a few "" and I manage to get a plot but it doen't look like a volcano. After my modifications the code is like that:

p <- plot_ly(data = diff_df, x = "Fold", y = "FDR", text = "external_gene_name", mode = "markers", color = "group") %>%
layout(title ="Volcano Plot") %>%
layout(annotations = a)
p


Sincerely, Yiannis

0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

0
Entering edit mode

x = Fold is referring to the name of a column in the data frame. object 'Fold' not found could mean that column is not present in diff_df. You might want to check head(diff_df) and colnames(diff_df) to make sure it resembles what I showed in the example, and that all the columns being used in the plotting function are present. If it still does not look the way you expected, posting a screenshot here may help.

0
Entering edit mode

input_file <- "C:/Users/dverma2/Desktop/TCGA_plots_data/volcanoplot/res1.csv"

colnames(diff_df)
[1] "external_gene_name" "baseMean"           "Fold"               "lfcSE"              "stat"
[6] "pvalue"             "FDR"

diff_df <- diff_df[c("external_gene_name", "Fold", "FDR")]
external_gene_name     Fold      FDR
1    ENSG00000225972 6.816421 5.63e-84
2    ENSG00000262902 4.775611 8.30e-47
3    ENSG00000230916 4.650652 6.43e-40
4    ENSG00000232177 3.747488 1.11e-35
5    ENSG00000229604 4.151809 6.06e-27
6    ENSG00000253239 4.365511 2.05e-22

diff_df["group"] <- "NotSignificant"
diff_df[which(diff_df['FDR'] < 0.05 & abs(diff_df['Fold']) < 1.5 ),"group"] <- "Significant"
diff_df[which(diff_df['FDR'] > 0.05 & abs(diff_df['Fold']) > 1.5 ),"group"] <- "FoldChange"
diff_df[which(diff_df['FDR'] < 0.05 & abs(diff_df['Fold']) > 1.5 ),"group"] <- "Significant&FoldChange"
top_peaks <- diff_df[with(diff_df, order(Fold, FDR)),][1:5,]
top_peaks <- rbind(top_peaks, diff_df[with(diff_df, order(-Fold, FDR)),][1:5,])
a <- list()
for (i in seq_len(nrow(top_peaks))) {
+   m <- top_peaks[i, ]
+   a[[i]] <- list(
+     x = m[["Fold"]],
+     y = -log10(m[["FDR"]]),
+     text = m[["external_gene_name"]],
+     xref = "x",
+     yref = "y",
+     showarrow = TRUE,
+     ax = 20,
+     ay = -40
+   )
+ }

p <- plot_ly(data = diff_df, x = Fold, y = -log10(FDR), text = external_gene_name, mode = "markers", color = group) %>%
+   layout(title ="Volcano Plot") %>%
+   layout(annotations = a)
Error in plot_ly(data = diff_df, x = Fold, y = -log10(FDR), text = external_gene_name,  :

0
Entering edit mode

The error implies that there is no column called 'Fold' in your input data object.

If you still have issues, please try EnhancedVolcano in R / Bioconductor.

0
Entering edit mode

I dont have P Values, than how to calculate that.

0
Entering edit mode

Don't do this. If you have a question, open a new post. If you wish to add a comment, use ADD COMMENT.

0
Entering edit mode

How to export(save as excel file) upregulated gene from volcano plot and downregulated gene separately.

2
Entering edit mode

In R, you would use the write.table function to save a copy of the dataframe (diff_df in this case) after filtering it for the up and down regulated genes (examples here).

0
Entering edit mode

Dear Dr.Kelly,

I have executed the above script with the Gfold values (similar to logfold2 change). since we are working with non replicated data, we don't have a p value. Thus, I would like to have +fold change on x axis and -fold change on y axis. I am able to get the graph but a regular scatter plot. I get the error message as below: No trace type specified: Based on info supplied, a 'scatter' trace seems appropriate. Read more about this trace type -> https://plot.ly/r/reference/#scatter.

Kindly help me to trouble shoot.

-Dr. Megha H.S

0
Entering edit mode

Is there a straightforward way to change the colors in the grouping?

0
Entering edit mode

I am trying to run the same script but getting one error at the last step.

>  Rscript Volcano_Plot_Ploty.R
[1] "Gene"   "Fold"   "pvalue"
[1] 524   3
Gene     Fold  pvalue
1   LUZP1 -7.57373 0.00850
2 FAM71F1 -6.31787 0.00865
3    MBD4 -5.81446 0.00870
4  HOXD13  4.85073 0.02340
5  DNAJC2 -2.75493 0.03140
6    CHD4 -2.49614 0.05700
Error in plot_ly(data = diff_df, x = Fold, y = -log10(pvalue), text = Gene,  :
Calls: %>% -> eval -> eval -> plot_ly
Execution halted


any suggestion is most welcome.

1
Entering edit mode

sounds like your object diff_df is missing column 'Fold'. I would step through it line by line in Rstudio and double check that nothing is missing.

0
Entering edit mode

During the run of script, I am printing the data which u can see i have added with error msg which came on terminal. So clearly i m seeing 524 rows and 3 columns are present . what could be error since tried executing line by line and still facing same issue.

0
Entering edit mode

This is what i am trying to run

suppressPackageStartupMessages(library("plotly"))
diff_df["group"] <- "NotSignificant"
diff_df[which(diff_df['pvalue'] < 0.05 & abs(diff_df['Fold']) < 1.5 ),"group"] <- "Significant"
diff_df[which(diff_df['pvalue'] > 0.05 & abs(diff_df['Fold']) > 1.5 ),"group"] <- "FoldChange"
diff_df[which(diff_df['pvalue'] < 0.05 & abs(diff_df['Fold']) > 1.5 ),"group"] <- "Significant&FoldChange"
top_peaks <- diff_df[with(diff_df, order(Fold, pvalue)),][1:5,]
top_peaks <- rbind(top_peaks, diff_df[with(diff_df, order(-Fold, pvalue)),][1:5,])
a <- list()
for (i in seq_len(nrow(top_peaks))) {
m <- top_peaks[i, ]
a[[i]] <- list(
x = m[["Fold"]],
y = -log10(m[["pvalue"]]),
text = m[["Gene"]],
xref = "x",
yref = "y",
showarrow = TRUE,
ax = 20,
ay = -40
)
}

p <- plot_ly(data = diff_df, x = Fold, y = -log10(pvalue), text = Gene, mode = "markers", color = group) %>%
layout(title ="Volcano Plot") %>%
layout(annotations = a)


My file looks like this (tab separated, formatted for display below)

Gene                Fold        pvalue
LUZP1               -7.57373    0.0085
FAM71F1             -6.31787    0.00865
MBD4                -5.81446    0.0087
HOXD13              4.85073     0.0234
DNAJC2              -2.75493    0.0314
CHD4                -2.49614    0.057
EEA1                -5.73976    0.0623
ENSBTAG00000043567  -4.13538    0.07295
TMEM220             -2.10145    0.0914
NHLH2               -1.7285     0.10305


I am still facing same error.

1
Entering edit mode

It works for me after little modification. I have put "" and for the object Fold and FDR, I have done something like this:

p <- plot_ly(data = diff_df, x = diff_df$Fold, y = -log10(diff_df$FDR), text = "external_gene_name", mode = "markers", color = "group")

0
Entering edit mode

p <- plot_ly(data = diff_df, x = diff_df$Fold, y = -log10(diff_df$FDR), text = diff_df$external_gene_name, mode = "markers", color = diff_df$group) %>% layout(title = "Volcano plot") %>% layout(annotations = a)

p

0
Entering edit mode

Please tidy your code with the 101 010 button

0
Entering edit mode