Question

clusterprofiler: Unable to plot results of compareCluster

0

Entering edit mode

3.6 years ago

benjytan1488 • 0

Hi all,

I am using the formula interface of compareCluster in clusterprofiler to look for enriched functional categories in each gene cluster. However, I was unable to plot the results and receive an error message.

A reproducible example is shown below. Data table can be downloaded here: https://drive.google.com/file/d/18yPu7nwz9MX6O4Qngwo-BXIKIHqGS_Ge/view?usp=sharing

# Load packages    
library(tidyverse)
library(clusterProfiler)

# Get data
df <- read_tsv("deg_compareClust.tsv")

I then run compareCluster with following parameters and then plot a dot plot.

res <- compareCluster(ENTREZID~ID+expGrp, data = df, fun = "enrichPathway")
dotplot(res)

I was supposed to get a dot plot with functional enriched categories in each cluster, but I got an error message as below:

Error in `levels<-`(`*tmp*`, value = as.character(levels)) : 
  factor level [80] is duplicated

The resulting data frame looks OK to me. I looked into any duplicated items, but could not identify any conflicting Cluster-Description items.

> head(as.data.frame(res))
           Cluster ID        expGrp
1 C3.downregulated C3 downregulated
2 C3.downregulated C3 downregulated
3 C3.downregulated C3 downregulated
4 C3.downregulated C3 downregulated
5 C3.downregulated C3 downregulated
6 C3.downregulated C3 downregulated
                                                        Description
1 L13a-mediated translational silencing of Ceruloplasmin expression
2           GTP hydrolysis and joining of the 60S ribosomal subunit
3                                 Eukaryotic Translation Initiation
4                              Cap-dependent Translation Initiation
5                                          Peptide chain elongation
6                                            Viral mRNA Translation
  GeneRatio   BgRatio       pvalue     p.adjust       qvalue
1     10/24 111/10654 1.731717e-14 1.230307e-12 8.478454e-13
2     10/24 112/10654 1.899202e-14 1.230307e-12 8.478454e-13
3     10/24 119/10654 3.540453e-14 1.230307e-12 8.478454e-13
4     10/24 119/10654 3.540453e-14 1.230307e-12 8.478454e-13
5      9/24  89/10654 1.545417e-13 3.580216e-12 2.467245e-12
6      9/24  89/10654 1.545417e-13 3.580216e-12 2.467245e-12
                                             geneID Count
1 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
2 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
3 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
4 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
5      6167/6206/6133/6161/6227/6228/6146/6156/6204     9
6      6167/6206/6133/6161/6227/6228/6146/6156/6204     9

My sessionInfo() is as below:

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /usr/local/lib64/R/lib/libRblas.so
LAPACK: /usr/local/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=ja_JP.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=ja_JP.UTF-8        LC_COLLATE=ja_JP.UTF-8    
 [5] LC_MONETARY=ja_JP.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=ja_JP.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] fgsea_1.14.0           ReactomePA_1.32.0     
 [3] GO.db_3.11.4           org.Hs.eg.db_3.11.4   
 [5] AnnotationDbi_1.50.3   IRanges_2.22.2        
 [7] S4Vectors_0.26.1       Biobase_2.48.0        
 [9] BiocGenerics_0.34.0    clusterProfiler_3.17.4
[11] forcats_0.5.0          stringr_1.4.0         
[13] dplyr_1.0.2            purrr_0.3.4           
[15] readr_1.3.1            tidyr_1.1.2           
[17] tibble_3.0.3           ggplot2_3.3.2         
[19] tidyverse_1.3.0       

loaded via a namespace (and not attached):
  [1] ggnewscale_0.4.3    ggbeeswarm_0.6.0    colorspace_1.4-1   
  [4] ggridges_0.5.2      ellipsis_0.3.1      rio_0.5.16         
  [7] qvalue_2.20.0       fs_1.5.0            rstudioapi_0.11    
 [10] farver_2.0.3        graphlayouts_0.7.0  ggrepel_0.8.2      
 [13] bit64_4.0.5         scatterpie_0.1.5    fansi_0.4.1        
 [16] lubridate_1.7.9     xml2_1.3.2          splines_4.0.2      
 [19] GOSemSim_2.14.2     polyclip_1.10-0     jsonlite_1.7.1     
 [22] broom_0.7.0         dbplyr_1.4.4        graph_1.66.0       
 [25] pheatmap_1.0.12     graphite_1.34.0     ggforce_0.3.2      
 [28] BiocManager_1.30.10 compiler_4.0.2      httr_1.4.2         
 [31] rvcheck_0.1.8       backports_1.1.9     assertthat_0.2.1   
 [34] Matrix_1.2-18       cli_2.0.2           tweenr_1.0.1       
 [37] tools_4.0.2         igraph_1.2.5        gtable_0.3.0       
 [40] glue_1.4.2          reshape2_1.4.4      DO.db_2.9          
 [43] rappdirs_0.3.1      ggthemes_4.2.0      fastmatch_1.1-0    
 [46] Rcpp_1.0.5          enrichplot_1.9.3    carData_3.0-4      
 [49] cellranger_1.1.0    vctrs_0.3.4         ggraph_2.0.3       
 [52] openxlsx_4.1.5      rvest_0.3.6         lifecycle_0.2.0    
 [55] rstatix_0.6.0       DOSE_3.14.0         MASS_7.3-53        
 [58] scales_1.1.1        tidygraph_1.2.0     reactome.db_1.70.0 
 [61] hms_0.5.3           RColorBrewer_1.1-2  yaml_2.2.1         
 [64] curl_4.3            memoise_1.1.0       gridExtra_2.3      
 [67] downloader_0.4      stringi_1.5.3       RSQLite_2.2.0      
 [70] checkmate_2.0.0     zip_2.1.1           BiocParallel_1.22.0
 [73] rlang_0.4.7         pkgconfig_2.0.3     lattice_0.20-41    
 [76] shadowtext_0.0.7    cowplot_1.1.0       bit_4.0.4          
 [79] tidyselect_1.1.0    plyr_1.8.6          magrittr_1.5       
 [82] R6_2.4.1            generics_0.0.2      DBI_1.1.0          
 [85] pillar_1.4.6        haven_2.3.1         foreign_0.8-80     
 [88] withr_2.2.0         abind_1.4-5         modelr_0.1.8       
 [91] crayon_1.3.4        car_3.0-9           viridis_0.5.1      
 [94] grid_4.0.2          readxl_1.3.1        data.table_1.13.0  
 [97] blob_1.2.1          reprex_0.3.0        digest_0.6.25      
[100] gridGraphics_0.5-0  munsell_0.5.0       beeswarm_0.2.3     
[103] viridisLite_0.3.0   ggplotify_0.0.5     vipor_0.4.5

A similar post was seen on GitHub but no feedback was given: https://github.com/YuLab-SMU/clusterProfiler/issues/116

I have made sure I'm using the latest version, checked clusterProfiler vignettes and googled the issue but could not find any possible solutions. Please advice on how to proceed.

Thank you very much!

clusterprofiler • 1.8k views

ADD COMMENT • link updated 3.6 years ago by huerqiang ▴ 30 • written 3.6 years ago by benjytan1488 • 0

score 2 · Accepted Answer · 2020-09-15

2

Entering edit mode

3.6 years ago

huerqiang ▴ 30

This is because there is an "ID" column in your data. Just change the column name.

df <- read_tsv("deg_compareClust.tsv")
colnames(df)[3] <- "group1"
res <- compareCluster(ENTREZID~group1+expGrp, data = df, fun = "enrichPathway")
dotplot(res)

ADD COMMENT • link 3.6 years ago by huerqiang ▴ 30

0

Entering edit mode

Thank you very much for the solution. It worked! But I'm curious, why does the "ID" column interfere with the plotting? I don't see any duplicated columns in the data frame...

ADD REPLY • link 3.6 years ago by benjytan1488 • 0

1

Entering edit mode

That's because the compareClusterResult slot of the compareCluster result will have a column of "ID", and the columns of "group1" and "expGrp"( in the input file). If there is an "ID" column in your input file, there will be a conflict.

> res@compareClusterResult[1:5, 1:5]
           Cluster group1        expGrp           ID                                                       Description
1 C3.downregulated     C3 downregulated R-HSA-156827 L13a-mediated translational silencing of Ceruloplasmin expression
2 C3.downregulated     C3 downregulated  R-HSA-72706           GTP hydrolysis and joining of the 60S ribosomal subunit
3 C3.downregulated     C3 downregulated  R-HSA-72613                                 Eukaryotic Translation Initiation
4 C3.downregulated     C3 downregulated  R-HSA-72737                              Cap-dependent Translation Initiation
5 C3.downregulated     C3 downregulated R-HSA-156902                                          Peptide chain elongation

ADD REPLY • link 3.6 years ago by huerqiang ▴ 30

0

Entering edit mode

I see... I didn't realize there's an "ID" column in the result table which is missing in mine as I used the name "ID' for my data. Thank you very much again for your solution!

ADD REPLY • link 3.6 years ago by benjytan1488 • 0