Question: clusterprofiler: Unable to plot results of compareCluster
0
gravatar for benjytan1488
10 weeks ago by
benjytan14880 wrote:

Hi all,

I am using the formula interface of compareCluster in clusterprofiler to look for enriched functional categories in each gene cluster. However, I was unable to plot the results and receive an error message.

A reproducible example is shown below. Data table can be downloaded here: https://drive.google.com/file/d/18yPu7nwz9MX6O4Qngwo-BXIKIHqGS_Ge/view?usp=sharing

# Load packages    
library(tidyverse)
library(clusterProfiler)

# Get data
df <- read_tsv("deg_compareClust.tsv")

I then run compareCluster with following parameters and then plot a dot plot.

res <- compareCluster(ENTREZID~ID+expGrp, data = df, fun = "enrichPathway")
dotplot(res)

I was supposed to get a dot plot with functional enriched categories in each cluster, but I got an error message as below:

Error in `levels<-`(`*tmp*`, value = as.character(levels)) : 
  factor level [80] is duplicated

The resulting data frame looks OK to me. I looked into any duplicated items, but could not identify any conflicting Cluster-Description items.

> head(as.data.frame(res))
           Cluster ID        expGrp
1 C3.downregulated C3 downregulated
2 C3.downregulated C3 downregulated
3 C3.downregulated C3 downregulated
4 C3.downregulated C3 downregulated
5 C3.downregulated C3 downregulated
6 C3.downregulated C3 downregulated
                                                        Description
1 L13a-mediated translational silencing of Ceruloplasmin expression
2           GTP hydrolysis and joining of the 60S ribosomal subunit
3                                 Eukaryotic Translation Initiation
4                              Cap-dependent Translation Initiation
5                                          Peptide chain elongation
6                                            Viral mRNA Translation
  GeneRatio   BgRatio       pvalue     p.adjust       qvalue
1     10/24 111/10654 1.731717e-14 1.230307e-12 8.478454e-13
2     10/24 112/10654 1.899202e-14 1.230307e-12 8.478454e-13
3     10/24 119/10654 3.540453e-14 1.230307e-12 8.478454e-13
4     10/24 119/10654 3.540453e-14 1.230307e-12 8.478454e-13
5      9/24  89/10654 1.545417e-13 3.580216e-12 2.467245e-12
6      9/24  89/10654 1.545417e-13 3.580216e-12 2.467245e-12
                                             geneID Count
1 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
2 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
3 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
4 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
5      6167/6206/6133/6161/6227/6228/6146/6156/6204     9
6      6167/6206/6133/6161/6227/6228/6146/6156/6204     9

My sessionInfo() is as below:

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /usr/local/lib64/R/lib/libRblas.so
LAPACK: /usr/local/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=ja_JP.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=ja_JP.UTF-8        LC_COLLATE=ja_JP.UTF-8    
 [5] LC_MONETARY=ja_JP.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=ja_JP.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] fgsea_1.14.0           ReactomePA_1.32.0     
 [3] GO.db_3.11.4           org.Hs.eg.db_3.11.4   
 [5] AnnotationDbi_1.50.3   IRanges_2.22.2        
 [7] S4Vectors_0.26.1       Biobase_2.48.0        
 [9] BiocGenerics_0.34.0    clusterProfiler_3.17.4
[11] forcats_0.5.0          stringr_1.4.0         
[13] dplyr_1.0.2            purrr_0.3.4           
[15] readr_1.3.1            tidyr_1.1.2           
[17] tibble_3.0.3           ggplot2_3.3.2         
[19] tidyverse_1.3.0       

loaded via a namespace (and not attached):
  [1] ggnewscale_0.4.3    ggbeeswarm_0.6.0    colorspace_1.4-1   
  [4] ggridges_0.5.2      ellipsis_0.3.1      rio_0.5.16         
  [7] qvalue_2.20.0       fs_1.5.0            rstudioapi_0.11    
 [10] farver_2.0.3        graphlayouts_0.7.0  ggrepel_0.8.2      
 [13] bit64_4.0.5         scatterpie_0.1.5    fansi_0.4.1        
 [16] lubridate_1.7.9     xml2_1.3.2          splines_4.0.2      
 [19] GOSemSim_2.14.2     polyclip_1.10-0     jsonlite_1.7.1     
 [22] broom_0.7.0         dbplyr_1.4.4        graph_1.66.0       
 [25] pheatmap_1.0.12     graphite_1.34.0     ggforce_0.3.2      
 [28] BiocManager_1.30.10 compiler_4.0.2      httr_1.4.2         
 [31] rvcheck_0.1.8       backports_1.1.9     assertthat_0.2.1   
 [34] Matrix_1.2-18       cli_2.0.2           tweenr_1.0.1       
 [37] tools_4.0.2         igraph_1.2.5        gtable_0.3.0       
 [40] glue_1.4.2          reshape2_1.4.4      DO.db_2.9          
 [43] rappdirs_0.3.1      ggthemes_4.2.0      fastmatch_1.1-0    
 [46] Rcpp_1.0.5          enrichplot_1.9.3    carData_3.0-4      
 [49] cellranger_1.1.0    vctrs_0.3.4         ggraph_2.0.3       
 [52] openxlsx_4.1.5      rvest_0.3.6         lifecycle_0.2.0    
 [55] rstatix_0.6.0       DOSE_3.14.0         MASS_7.3-53        
 [58] scales_1.1.1        tidygraph_1.2.0     reactome.db_1.70.0 
 [61] hms_0.5.3           RColorBrewer_1.1-2  yaml_2.2.1         
 [64] curl_4.3            memoise_1.1.0       gridExtra_2.3      
 [67] downloader_0.4      stringi_1.5.3       RSQLite_2.2.0      
 [70] checkmate_2.0.0     zip_2.1.1           BiocParallel_1.22.0
 [73] rlang_0.4.7         pkgconfig_2.0.3     lattice_0.20-41    
 [76] shadowtext_0.0.7    cowplot_1.1.0       bit_4.0.4          
 [79] tidyselect_1.1.0    plyr_1.8.6          magrittr_1.5       
 [82] R6_2.4.1            generics_0.0.2      DBI_1.1.0          
 [85] pillar_1.4.6        haven_2.3.1         foreign_0.8-80     
 [88] withr_2.2.0         abind_1.4-5         modelr_0.1.8       
 [91] crayon_1.3.4        car_3.0-9           viridis_0.5.1      
 [94] grid_4.0.2          readxl_1.3.1        data.table_1.13.0  
 [97] blob_1.2.1          reprex_0.3.0        digest_0.6.25      
[100] gridGraphics_0.5-0  munsell_0.5.0       beeswarm_0.2.3     
[103] viridisLite_0.3.0   ggplotify_0.0.5     vipor_0.4.5

A similar post was seen on GitHub but no feedback was given: https://github.com/YuLab-SMU/clusterProfiler/issues/116

I have made sure I'm using the latest version, checked clusterProfiler vignettes and googled the issue but could not find any possible solutions. Please advice on how to proceed.

Thank you very much!

clusterprofiler • 180 views
ADD COMMENTlink modified 10 weeks ago by huerqiang30 • written 10 weeks ago by benjytan14880
2
gravatar for huerqiang
10 weeks ago by
huerqiang30
China/Guang zhou/Southern medical university
huerqiang30 wrote:

This is because there is an "ID" column in your data. Just change the column name.

df <- read_tsv("deg_compareClust.tsv")
colnames(df)[3] <- "group1"
res <- compareCluster(ENTREZID~group1+expGrp, data = df, fun = "enrichPathway")
dotplot(res)
ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by huerqiang30

Thank you very much for the solution. It worked! But I'm curious, why does the "ID" column interfere with the plotting? I don't see any duplicated columns in the data frame...

ADD REPLYlink written 10 weeks ago by benjytan14880
1

That's because the compareClusterResult slot of the compareCluster result will have a column of "ID", and the columns of "group1" and "expGrp"( in the input file). If there is an "ID" column in your input file, there will be a conflict.

> res@compareClusterResult[1:5, 1:5]
           Cluster group1        expGrp           ID                                                       Description
1 C3.downregulated     C3 downregulated R-HSA-156827 L13a-mediated translational silencing of Ceruloplasmin expression
2 C3.downregulated     C3 downregulated  R-HSA-72706           GTP hydrolysis and joining of the 60S ribosomal subunit
3 C3.downregulated     C3 downregulated  R-HSA-72613                                 Eukaryotic Translation Initiation
4 C3.downregulated     C3 downregulated  R-HSA-72737                              Cap-dependent Translation Initiation
5 C3.downregulated     C3 downregulated R-HSA-156902                                          Peptide chain elongation
ADD REPLYlink written 10 weeks ago by huerqiang30

I see... I didn't realize there's an "ID" column in the result table which is missing in mine as I used the name "ID' for my data. Thank you very much again for your solution!

ADD REPLYlink written 10 weeks ago by benjytan14880
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2126 users visited in the last hour