Question: Label selected genes in volcano plot from ggplot
1
gravatar for saamar.rajput
6 months ago by
Germany
saamar.rajput50 wrote:

I have a data frame with the differentially expressed genes from EdgeR, Now I am trying to make a volcano plot of it but I want to see only selected genes that are of interest to me to be labelled on the volcano plot. My data frame looks like this

head(results)

   Gene      Fold       pvalue          FDR      sig
1 ADORA2A 10.273854 1.164636e-28 2.234471e-24 FDR<0.05
2   IL23A  8.132293 4.554177e-28 4.368822e-24 FDR<0.05
3    IL1A  8.430078 6.768343e-27 4.328581e-23 FDR<0.05
4   CXCL6  7.102900 3.299464e-23 1.582588e-19 FDR<0.05
5    CCR7  8.950486 9.111421e-23 3.496235e-19 FDR<0.05
6  IL18R1  6.759646 7.283440e-22 2.329001e-18 FDR<0.05

I tried the below code to generate the volcano plot, which is generated successfully and the selected genes are marked too but not with their exact name but with some random number.

results$genelabels <- ""
results$genelabels <- ifelse(results$Gene == "IL23A" 
                             | results$Gene == "IL1A"
                             |results$Gene == "IL6"
                             |results$Gene == "CD80"
                             |results$Gene == "CD86"
                             |results$Gene == "NFKB"
                             |results$Gene == "BAFT2", TRUE,FALSE)
 ggplot(results) + geom_point(aes(Fold, -log10(FDR),col=sig))+ geom_text_repel(aes(Fold, -log10(FDR)),label = ifelse(results$genelabels == TRUE, results$Gene,""), box.padding = unit(0.45, "lines"),hjust=1) + theme(legend.title=element_blank(),text = element_text(size=20))+ scale_color_manual(values=c("red", "black"))

Can somebody help me out with what is wrong with the code that is does not show the exact name of the gene on volcano plot?

volcano plot ggplot2 R • 646 views
ADD COMMENTlink modified 6 months ago by zx87548.7k • written 6 months ago by saamar.rajput50
3
gravatar for benformatics
6 months ago by
benformatics1.2k
ETH Zurich
benformatics1.2k wrote:

Your labels are of class factor and not character. If you wrap it with as.character in your plotting call it should work.

ggplot(results) + geom_point(aes(Fold, -log10(FDR),col=sig))+ geom_text_repel(aes(Fold, -log10(FDR)),label = ifelse(results$genelabels == TRUE, as.character(results$Gene),""), box.padding = unit(0.45, "lines"),hjust=1) + theme(legend.title=element_blank(),text = element_text(size=20))+ scale_color_manual(values=c("red", "black"))

Specifically change the following:

 ifelse(results$genelabels, results$Gene,"")

to

 ifelse(results$genelabels, as.character(results$Gene),"")
ADD COMMENTlink modified 6 months ago • written 6 months ago by benformatics1.2k

Thank you so much. It worked !!

ADD REPLYlink written 6 months ago by saamar.rajput50
2
gravatar for SMK
6 months ago by
SMK1.9k
SMK1.9k wrote:

Hi saamar.rajput,

See the answer from benformatics (credits goes to him), the key is to make sure your "Gene" is in character type instead of factor.

> results <- read.delim("example.txt", header = TRUE, stringsAsFactors = FALSE)
> head(results)
     Gene      Fold       pvalue          FDR      sig
1 ADORA2A 10.273854 1.164636e-28 2.234471e-24 FDR<0.05
2   IL23A  8.132293 4.554177e-28 4.368822e-24 FDR<0.05
3    IL1A  8.430078 6.768343e-27 4.328581e-23 FDR<0.05
4   CXCL6  7.102900 3.299464e-23 1.582588e-19 FDR<0.05
5    CCR7  8.950486 9.111421e-23 3.496235e-19 FDR<0.05
6  IL18R1  6.759646 7.283440e-22 2.329001e-18 FDR<0.05
> str(results)
'data.frame':   6 obs. of  5 variables:
 $ Gene  : chr  "ADORA2A" "IL23A" "IL1A" "CXCL6" ...
 $ Fold  : num  10.27 8.13 8.43 7.1 8.95 ...
 $ pvalue: num  1.16e-28 4.55e-28 6.77e-27 3.30e-23 9.11e-23 ...
 $ FDR   : num  2.23e-24 4.37e-24 4.33e-23 1.58e-19 3.50e-19 ...
 $ sig   : chr  "FDR<0.05" "FDR<0.05" "FDR<0.05" "FDR<0.05" ...

> results$genelabels <- ""
> results$genelabels <- ifelse(results$Gene == "IL23A" 
+                              | results$Gene == "IL1A"
+                              | results$Gene == "IL6"
+                              | results$Gene == "CD80"
+                              | results$Gene == "CD86"
+                              | results$Gene == "NFKB"
+                              | results$Gene == "BAFT2", TRUE, FALSE)

> ggplot(results) +
+   geom_point(aes(Fold, -log10(FDR), col = sig)) +
+   geom_text_repel(
+     aes(Fold, -log10(FDR)),
+     label = ifelse(results$genelabels, results$Gene, ""),
+     box.padding = unit(0.45, "lines"),
+     hjust = 1
+   ) +
+   theme(legend.title = element_blank(), text = element_text(size = 20)) +
+   scale_color_manual(values = c("red", "black"))

volcano

ADD COMMENTlink modified 6 months ago • written 6 months ago by SMK1.9k

Thank you so much :)

ADD REPLYlink written 6 months ago by saamar.rajput50
2
gravatar for zx8754
6 months ago by
zx87548.7k
London
zx87548.7k wrote:

As other answers pointed out you have Gene column as factor, and it is getting converted to integer within ggplot ifelse. Instead some solutions:

  1. Read the file with stringsAsFactors = FALSE (see @SMK's answer)
  2. Convert the column to character, results$Gene <- as.character(results$Gene) (see @benformatics answer)
  3. If we wish to keep as factor, then redefine levels, see below:

# redefine levels:
results$genelabels <- factor(results$Gene, levels = c("IL23A", "IL1A","IL6", "CD80","CD86","NFKB","BAFT2"))

# or convert to character:
# results$genelabels <- ifelse(results$Gene %in% c("IL23A", "IL1A","IL6", "CD80","CD86","NFKB","BAFT2"), 
#                              as.character(results$Gene), NA)

ggplot(results, aes(Fold, -log10(FDR), label = genelabels, col = sig)) + 
  geom_point() +
  geom_text_repel(col = "black", na.rm = TRUE, box.padding = unit(0.45, "lines"), hjust = 1) + 
  scale_color_manual(values = c("red", "black")) +
  theme(legend.title = element_blank(), text = element_text(size = 20))

Other notes:

  • avoid using $ within ggplot
  • use %in% instead of chain of OR conditions |
  • comparison already returns logical value, no need for ifelse, e.g: ifelse(x == 1, TRUE, FALSE) same as x == 1
  • no need to "initiate" a new column: results$genelabels <- ""
  • readability: use spaces
  • readability: use line-breaks in ggplot for each layer after +.
ADD COMMENTlink written 6 months ago by zx87548.7k

Nice tips! 👍 Thanks, zx8754.

ADD REPLYlink written 6 months ago by SMK1.9k

Thank you very much, I will keep in mind the things you pointed out :)

ADD REPLYlink written 6 months ago by saamar.rajput50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1625 users visited in the last hour