Question: Label selected genes in volcano plot from ggplot
1
gravatar for saamar.rajput
21 months ago by
Germany
saamar.rajput60 wrote:

I have a data frame with the differentially expressed genes from EdgeR, Now I am trying to make a volcano plot of it but I want to see only selected genes that are of interest to me to be labelled on the volcano plot. My data frame looks like this

head(results)

   Gene      Fold       pvalue          FDR      sig
1 ADORA2A 10.273854 1.164636e-28 2.234471e-24 FDR<0.05
2   IL23A  8.132293 4.554177e-28 4.368822e-24 FDR<0.05
3    IL1A  8.430078 6.768343e-27 4.328581e-23 FDR<0.05
4   CXCL6  7.102900 3.299464e-23 1.582588e-19 FDR<0.05
5    CCR7  8.950486 9.111421e-23 3.496235e-19 FDR<0.05
6  IL18R1  6.759646 7.283440e-22 2.329001e-18 FDR<0.05

I tried the below code to generate the volcano plot, which is generated successfully and the selected genes are marked too but not with their exact name but with some random number.

results$genelabels <- ""
results$genelabels <- ifelse(results$Gene == "IL23A" 
                             | results$Gene == "IL1A"
                             |results$Gene == "IL6"
                             |results$Gene == "CD80"
                             |results$Gene == "CD86"
                             |results$Gene == "NFKB"
                             |results$Gene == "BAFT2", TRUE,FALSE)
 ggplot(results) + geom_point(aes(Fold, -log10(FDR),col=sig))+ geom_text_repel(aes(Fold, -log10(FDR)),label = ifelse(results$genelabels == TRUE, results$Gene,""), box.padding = unit(0.45, "lines"),hjust=1) + theme(legend.title=element_blank(),text = element_text(size=20))+ scale_color_manual(values=c("red", "black"))

Can somebody help me out with what is wrong with the code that is does not show the exact name of the gene on volcano plot?

volcano plot ggplot2 R • 4.3k views
ADD COMMENTlink modified 21 months ago by zx87549.9k • written 21 months ago by saamar.rajput60
3
gravatar for benformatics
21 months ago by
benformatics2.0k
ETH Zurich
benformatics2.0k wrote:

Your labels are of class factor and not character. If you wrap it with as.character in your plotting call it should work.

ggplot(results) + geom_point(aes(Fold, -log10(FDR),col=sig))+ geom_text_repel(aes(Fold, -log10(FDR)),label = ifelse(results$genelabels == TRUE, as.character(results$Gene),""), box.padding = unit(0.45, "lines"),hjust=1) + theme(legend.title=element_blank(),text = element_text(size=20))+ scale_color_manual(values=c("red", "black"))

Specifically change the following:

 ifelse(results$genelabels, results$Gene,"")

to

 ifelse(results$genelabels, as.character(results$Gene),"")
ADD COMMENTlink modified 21 months ago • written 21 months ago by benformatics2.0k
2
gravatar for AK
21 months ago by
AK2.0k
Taipei
AK2.0k wrote:

Hi saamar.rajput,

See the answer from benformatics (credits goes to him), the key is to make sure your "Gene" is in character type instead of factor.

> results <- read.delim("example.txt", header = TRUE, stringsAsFactors = FALSE)
> head(results)
     Gene      Fold       pvalue          FDR      sig
1 ADORA2A 10.273854 1.164636e-28 2.234471e-24 FDR<0.05
2   IL23A  8.132293 4.554177e-28 4.368822e-24 FDR<0.05
3    IL1A  8.430078 6.768343e-27 4.328581e-23 FDR<0.05
4   CXCL6  7.102900 3.299464e-23 1.582588e-19 FDR<0.05
5    CCR7  8.950486 9.111421e-23 3.496235e-19 FDR<0.05
6  IL18R1  6.759646 7.283440e-22 2.329001e-18 FDR<0.05
> str(results)
'data.frame':   6 obs. of  5 variables:
 $ Gene  : chr  "ADORA2A" "IL23A" "IL1A" "CXCL6" ...
 $ Fold  : num  10.27 8.13 8.43 7.1 8.95 ...
 $ pvalue: num  1.16e-28 4.55e-28 6.77e-27 3.30e-23 9.11e-23 ...
 $ FDR   : num  2.23e-24 4.37e-24 4.33e-23 1.58e-19 3.50e-19 ...
 $ sig   : chr  "FDR<0.05" "FDR<0.05" "FDR<0.05" "FDR<0.05" ...

> results$genelabels <- ""
> results$genelabels <- ifelse(results$Gene == "IL23A" 
+                              | results$Gene == "IL1A"
+                              | results$Gene == "IL6"
+                              | results$Gene == "CD80"
+                              | results$Gene == "CD86"
+                              | results$Gene == "NFKB"
+                              | results$Gene == "BAFT2", TRUE, FALSE)

> ggplot(results) +
+   geom_point(aes(Fold, -log10(FDR), col = sig)) +
+   geom_text_repel(
+     aes(Fold, -log10(FDR)),
+     label = ifelse(results$genelabels, results$Gene, ""),
+     box.padding = unit(0.45, "lines"),
+     hjust = 1
+   ) +
+   theme(legend.title = element_blank(), text = element_text(size = 20)) +
+   scale_color_manual(values = c("red", "black"))

volcano

ADD COMMENTlink modified 21 months ago • written 21 months ago by AK2.0k

Thank you so much :)

ADD REPLYlink written 21 months ago by saamar.rajput60
2
gravatar for zx8754
21 months ago by
zx87549.9k
London
zx87549.9k wrote:

As other answers pointed out you have Gene column as factor, and it is getting converted to integer within ggplot ifelse. Instead some solutions:

  1. Read the file with stringsAsFactors = FALSE (see @SMK's answer)
  2. Convert the column to character, results$Gene <- as.character(results$Gene) (see @benformatics answer)
  3. If we wish to keep as factor, then redefine levels, see below:

# redefine levels:
results$genelabels <- factor(results$Gene, levels = c("IL23A", "IL1A","IL6", "CD80","CD86","NFKB","BAFT2"))

# or convert to character:
# results$genelabels <- ifelse(results$Gene %in% c("IL23A", "IL1A","IL6", "CD80","CD86","NFKB","BAFT2"), 
#                              as.character(results$Gene), NA)

ggplot(results, aes(Fold, -log10(FDR), label = genelabels, col = sig)) + 
  geom_point() +
  geom_text_repel(col = "black", na.rm = TRUE, box.padding = unit(0.45, "lines"), hjust = 1) + 
  scale_color_manual(values = c("red", "black")) +
  theme(legend.title = element_blank(), text = element_text(size = 20))

Other notes:

  • avoid using $ within ggplot
  • use %in% instead of chain of OR conditions |
  • comparison already returns logical value, no need for ifelse, e.g: ifelse(x == 1, TRUE, FALSE) same as x == 1
  • no need to "initiate" a new column: results$genelabels <- ""
  • readability: use spaces
  • readability: use line-breaks in ggplot for each layer after +.
ADD COMMENTlink written 21 months ago by zx87549.9k

Nice tips! 👍 Thanks, zx8754.

ADD REPLYlink written 21 months ago by AK2.0k

Thank you very much, I will keep in mind the things you pointed out :)

ADD REPLYlink written 21 months ago by saamar.rajput60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1143 users visited in the last hour
_