Question: Label selected genes in volcano plot from ggplot
1
gravatar for saamar.rajput
16 months ago by
Germany
saamar.rajput60 wrote:

I have a data frame with the differentially expressed genes from EdgeR, Now I am trying to make a volcano plot of it but I want to see only selected genes that are of interest to me to be labelled on the volcano plot. My data frame looks like this

head(results)

   Gene      Fold       pvalue          FDR      sig
1 ADORA2A 10.273854 1.164636e-28 2.234471e-24 FDR<0.05
2   IL23A  8.132293 4.554177e-28 4.368822e-24 FDR<0.05
3    IL1A  8.430078 6.768343e-27 4.328581e-23 FDR<0.05
4   CXCL6  7.102900 3.299464e-23 1.582588e-19 FDR<0.05
5    CCR7  8.950486 9.111421e-23 3.496235e-19 FDR<0.05
6  IL18R1  6.759646 7.283440e-22 2.329001e-18 FDR<0.05

I tried the below code to generate the volcano plot, which is generated successfully and the selected genes are marked too but not with their exact name but with some random number.

results$genelabels <- ""
results$genelabels <- ifelse(results$Gene == "IL23A" 
                             | results$Gene == "IL1A"
                             |results$Gene == "IL6"
                             |results$Gene == "CD80"
                             |results$Gene == "CD86"
                             |results$Gene == "NFKB"
                             |results$Gene == "BAFT2", TRUE,FALSE)
 ggplot(results) + geom_point(aes(Fold, -log10(FDR),col=sig))+ geom_text_repel(aes(Fold, -log10(FDR)),label = ifelse(results$genelabels == TRUE, results$Gene,""), box.padding = unit(0.45, "lines"),hjust=1) + theme(legend.title=element_blank(),text = element_text(size=20))+ scale_color_manual(values=c("red", "black"))

Can somebody help me out with what is wrong with the code that is does not show the exact name of the gene on volcano plot?

volcano plot ggplot2 R • 2.9k views
ADD COMMENTlink modified 15 months ago by zx87549.6k • written 16 months ago by saamar.rajput60
3
gravatar for benformatics
16 months ago by
benformatics1.9k
ETH Zurich
benformatics1.9k wrote:

Your labels are of class factor and not character. If you wrap it with as.character in your plotting call it should work.

ggplot(results) + geom_point(aes(Fold, -log10(FDR),col=sig))+ geom_text_repel(aes(Fold, -log10(FDR)),label = ifelse(results$genelabels == TRUE, as.character(results$Gene),""), box.padding = unit(0.45, "lines"),hjust=1) + theme(legend.title=element_blank(),text = element_text(size=20))+ scale_color_manual(values=c("red", "black"))

Specifically change the following:

 ifelse(results$genelabels, results$Gene,"")

to

 ifelse(results$genelabels, as.character(results$Gene),"")
ADD COMMENTlink modified 16 months ago • written 16 months ago by benformatics1.9k

Thank you so much. It worked !!

ADD REPLYlink written 16 months ago by saamar.rajput60
2
gravatar for SMK
16 months ago by
SMK1.9k
SMK1.9k wrote:

Hi saamar.rajput,

See the answer from benformatics (credits goes to him), the key is to make sure your "Gene" is in character type instead of factor.

> results <- read.delim("example.txt", header = TRUE, stringsAsFactors = FALSE)
> head(results)
     Gene      Fold       pvalue          FDR      sig
1 ADORA2A 10.273854 1.164636e-28 2.234471e-24 FDR<0.05
2   IL23A  8.132293 4.554177e-28 4.368822e-24 FDR<0.05
3    IL1A  8.430078 6.768343e-27 4.328581e-23 FDR<0.05
4   CXCL6  7.102900 3.299464e-23 1.582588e-19 FDR<0.05
5    CCR7  8.950486 9.111421e-23 3.496235e-19 FDR<0.05
6  IL18R1  6.759646 7.283440e-22 2.329001e-18 FDR<0.05
> str(results)
'data.frame':   6 obs. of  5 variables:
 $ Gene  : chr  "ADORA2A" "IL23A" "IL1A" "CXCL6" ...
 $ Fold  : num  10.27 8.13 8.43 7.1 8.95 ...
 $ pvalue: num  1.16e-28 4.55e-28 6.77e-27 3.30e-23 9.11e-23 ...
 $ FDR   : num  2.23e-24 4.37e-24 4.33e-23 1.58e-19 3.50e-19 ...
 $ sig   : chr  "FDR<0.05" "FDR<0.05" "FDR<0.05" "FDR<0.05" ...

> results$genelabels <- ""
> results$genelabels <- ifelse(results$Gene == "IL23A" 
+                              | results$Gene == "IL1A"
+                              | results$Gene == "IL6"
+                              | results$Gene == "CD80"
+                              | results$Gene == "CD86"
+                              | results$Gene == "NFKB"
+                              | results$Gene == "BAFT2", TRUE, FALSE)

> ggplot(results) +
+   geom_point(aes(Fold, -log10(FDR), col = sig)) +
+   geom_text_repel(
+     aes(Fold, -log10(FDR)),
+     label = ifelse(results$genelabels, results$Gene, ""),
+     box.padding = unit(0.45, "lines"),
+     hjust = 1
+   ) +
+   theme(legend.title = element_blank(), text = element_text(size = 20)) +
+   scale_color_manual(values = c("red", "black"))

volcano

ADD COMMENTlink modified 16 months ago • written 16 months ago by SMK1.9k

Thank you so much :)

ADD REPLYlink written 16 months ago by saamar.rajput60
2
gravatar for zx8754
16 months ago by
zx87549.6k
London
zx87549.6k wrote:

As other answers pointed out you have Gene column as factor, and it is getting converted to integer within ggplot ifelse. Instead some solutions:

  1. Read the file with stringsAsFactors = FALSE (see @SMK's answer)
  2. Convert the column to character, results$Gene <- as.character(results$Gene) (see @benformatics answer)
  3. If we wish to keep as factor, then redefine levels, see below:

# redefine levels:
results$genelabels <- factor(results$Gene, levels = c("IL23A", "IL1A","IL6", "CD80","CD86","NFKB","BAFT2"))

# or convert to character:
# results$genelabels <- ifelse(results$Gene %in% c("IL23A", "IL1A","IL6", "CD80","CD86","NFKB","BAFT2"), 
#                              as.character(results$Gene), NA)

ggplot(results, aes(Fold, -log10(FDR), label = genelabels, col = sig)) + 
  geom_point() +
  geom_text_repel(col = "black", na.rm = TRUE, box.padding = unit(0.45, "lines"), hjust = 1) + 
  scale_color_manual(values = c("red", "black")) +
  theme(legend.title = element_blank(), text = element_text(size = 20))

Other notes:

  • avoid using $ within ggplot
  • use %in% instead of chain of OR conditions |
  • comparison already returns logical value, no need for ifelse, e.g: ifelse(x == 1, TRUE, FALSE) same as x == 1
  • no need to "initiate" a new column: results$genelabels <- ""
  • readability: use spaces
  • readability: use line-breaks in ggplot for each layer after +.
ADD COMMENTlink written 16 months ago by zx87549.6k

Nice tips! 👍 Thanks, zx8754.

ADD REPLYlink written 16 months ago by SMK1.9k

Thank you very much, I will keep in mind the things you pointed out :)

ADD REPLYlink written 16 months ago by saamar.rajput60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 683 users visited in the last hour