clusterprofiler: Unable to plot results of compareCluster
Entering edit mode
8 months ago

Hi all,

I am using the formula interface of compareCluster in clusterprofiler to look for enriched functional categories in each gene cluster. However, I was unable to plot the results and receive an error message.

A reproducible example is shown below. Data table can be downloaded here:

# Load packages    

# Get data
df <- read_tsv("deg_compareClust.tsv")

I then run compareCluster with following parameters and then plot a dot plot.

res <- compareCluster(ENTREZID~ID+expGrp, data = df, fun = "enrichPathway")

I was supposed to get a dot plot with functional enriched categories in each cluster, but I got an error message as below:

Error in `levels<-`(`*tmp*`, value = as.character(levels)) : 
  factor level [80] is duplicated

The resulting data frame looks OK to me. I looked into any duplicated items, but could not identify any conflicting Cluster-Description items.

> head(
           Cluster ID        expGrp
1 C3.downregulated C3 downregulated
2 C3.downregulated C3 downregulated
3 C3.downregulated C3 downregulated
4 C3.downregulated C3 downregulated
5 C3.downregulated C3 downregulated
6 C3.downregulated C3 downregulated
1 L13a-mediated translational silencing of Ceruloplasmin expression
2           GTP hydrolysis and joining of the 60S ribosomal subunit
3                                 Eukaryotic Translation Initiation
4                              Cap-dependent Translation Initiation
5                                          Peptide chain elongation
6                                            Viral mRNA Translation
  GeneRatio   BgRatio       pvalue     p.adjust       qvalue
1     10/24 111/10654 1.731717e-14 1.230307e-12 8.478454e-13
2     10/24 112/10654 1.899202e-14 1.230307e-12 8.478454e-13
3     10/24 119/10654 3.540453e-14 1.230307e-12 8.478454e-13
4     10/24 119/10654 3.540453e-14 1.230307e-12 8.478454e-13
5      9/24  89/10654 1.545417e-13 3.580216e-12 2.467245e-12
6      9/24  89/10654 1.545417e-13 3.580216e-12 2.467245e-12
                                             geneID Count
1 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
2 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
3 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
4 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204    10
5      6167/6206/6133/6161/6227/6228/6146/6156/6204     9
6      6167/6206/6133/6161/6227/6228/6146/6156/6204     9

My sessionInfo() is as below:

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /usr/local/lib64/R/lib/
LAPACK: /usr/local/lib64/R/lib/

 [1] LC_CTYPE=ja_JP.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=ja_JP.UTF-8        LC_COLLATE=ja_JP.UTF-8    
 [7] LC_PAPER=ja_JP.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] fgsea_1.14.0           ReactomePA_1.32.0     
 [3] GO.db_3.11.4    
 [5] AnnotationDbi_1.50.3   IRanges_2.22.2        
 [7] S4Vectors_0.26.1       Biobase_2.48.0        
 [9] BiocGenerics_0.34.0    clusterProfiler_3.17.4
[11] forcats_0.5.0          stringr_1.4.0         
[13] dplyr_1.0.2            purrr_0.3.4           
[15] readr_1.3.1            tidyr_1.1.2           
[17] tibble_3.0.3           ggplot2_3.3.2         
[19] tidyverse_1.3.0       

loaded via a namespace (and not attached):
  [1] ggnewscale_0.4.3    ggbeeswarm_0.6.0    colorspace_1.4-1   
  [4] ggridges_0.5.2      ellipsis_0.3.1      rio_0.5.16         
  [7] qvalue_2.20.0       fs_1.5.0            rstudioapi_0.11    
 [10] farver_2.0.3        graphlayouts_0.7.0  ggrepel_0.8.2      
 [13] bit64_4.0.5         scatterpie_0.1.5    fansi_0.4.1        
 [16] lubridate_1.7.9     xml2_1.3.2          splines_4.0.2      
 [19] GOSemSim_2.14.2     polyclip_1.10-0     jsonlite_1.7.1     
 [22] broom_0.7.0         dbplyr_1.4.4        graph_1.66.0       
 [25] pheatmap_1.0.12     graphite_1.34.0     ggforce_0.3.2      
 [28] BiocManager_1.30.10 compiler_4.0.2      httr_1.4.2         
 [31] rvcheck_0.1.8       backports_1.1.9     assertthat_0.2.1   
 [34] Matrix_1.2-18       cli_2.0.2           tweenr_1.0.1       
 [37] tools_4.0.2         igraph_1.2.5        gtable_0.3.0       
 [40] glue_1.4.2          reshape2_1.4.4      DO.db_2.9          
 [43] rappdirs_0.3.1      ggthemes_4.2.0      fastmatch_1.1-0    
 [46] Rcpp_1.0.5          enrichplot_1.9.3    carData_3.0-4      
 [49] cellranger_1.1.0    vctrs_0.3.4         ggraph_2.0.3       
 [52] openxlsx_4.1.5      rvest_0.3.6         lifecycle_0.2.0    
 [55] rstatix_0.6.0       DOSE_3.14.0         MASS_7.3-53        
 [58] scales_1.1.1        tidygraph_1.2.0     reactome.db_1.70.0 
 [61] hms_0.5.3           RColorBrewer_1.1-2  yaml_2.2.1         
 [64] curl_4.3            memoise_1.1.0       gridExtra_2.3      
 [67] downloader_0.4      stringi_1.5.3       RSQLite_2.2.0      
 [70] checkmate_2.0.0     zip_2.1.1           BiocParallel_1.22.0
 [73] rlang_0.4.7         pkgconfig_2.0.3     lattice_0.20-41    
 [76] shadowtext_0.0.7    cowplot_1.1.0       bit_4.0.4          
 [79] tidyselect_1.1.0    plyr_1.8.6          magrittr_1.5       
 [82] R6_2.4.1            generics_0.0.2      DBI_1.1.0          
 [85] pillar_1.4.6        haven_2.3.1         foreign_0.8-80     
 [88] withr_2.2.0         abind_1.4-5         modelr_0.1.8       
 [91] crayon_1.3.4        car_3.0-9           viridis_0.5.1      
 [94] grid_4.0.2          readxl_1.3.1        data.table_1.13.0  
 [97] blob_1.2.1          reprex_0.3.0        digest_0.6.25      
[100] gridGraphics_0.5-0  munsell_0.5.0       beeswarm_0.2.3     
[103] viridisLite_0.3.0   ggplotify_0.0.5     vipor_0.4.5

A similar post was seen on GitHub but no feedback was given:

I have made sure I'm using the latest version, checked clusterProfiler vignettes and googled the issue but could not find any possible solutions. Please advice on how to proceed.

Thank you very much!

clusterprofiler • 321 views
Entering edit mode
8 months ago
huerqiang ▴ 30

This is because there is an "ID" column in your data. Just change the column name.

df <- read_tsv("deg_compareClust.tsv")
colnames(df)[3] <- "group1"
res <- compareCluster(ENTREZID~group1+expGrp, data = df, fun = "enrichPathway")
Entering edit mode

Thank you very much for the solution. It worked! But I'm curious, why does the "ID" column interfere with the plotting? I don't see any duplicated columns in the data frame...

Entering edit mode

That's because the compareClusterResult slot of the compareCluster result will have a column of "ID", and the columns of "group1" and "expGrp"( in the input file). If there is an "ID" column in your input file, there will be a conflict.

> res@compareClusterResult[1:5, 1:5]
           Cluster group1        expGrp           ID                                                       Description
1 C3.downregulated     C3 downregulated R-HSA-156827 L13a-mediated translational silencing of Ceruloplasmin expression
2 C3.downregulated     C3 downregulated  R-HSA-72706           GTP hydrolysis and joining of the 60S ribosomal subunit
3 C3.downregulated     C3 downregulated  R-HSA-72613                                 Eukaryotic Translation Initiation
4 C3.downregulated     C3 downregulated  R-HSA-72737                              Cap-dependent Translation Initiation
5 C3.downregulated     C3 downregulated R-HSA-156902                                          Peptide chain elongation
Entering edit mode

I see... I didn't realize there's an "ID" column in the result table which is missing in mine as I used the name "ID' for my data. Thank you very much again for your solution!


Login before adding your answer.

Traffic: 1308 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6