Hi all,
I am using the formula interface of compareCluster
in clusterprofiler
to look for enriched functional categories in each gene cluster. However, I was unable to plot the results and receive an error message.
A reproducible example is shown below. Data table can be downloaded here: https://drive.google.com/file/d/18yPu7nwz9MX6O4Qngwo-BXIKIHqGS_Ge/view?usp=sharing
# Load packages
library(tidyverse)
library(clusterProfiler)
# Get data
df <- read_tsv("deg_compareClust.tsv")
I then run compareCluster
with following parameters and then plot a dot plot.
res <- compareCluster(ENTREZID~ID+expGrp, data = df, fun = "enrichPathway")
dotplot(res)
I was supposed to get a dot plot with functional enriched categories in each cluster, but I got an error message as below:
Error in `levels<-`(`*tmp*`, value = as.character(levels)) :
factor level [80] is duplicated
The resulting data frame looks OK to me. I looked into any duplicated items, but could not identify any conflicting Cluster-Description items.
> head(as.data.frame(res))
Cluster ID expGrp
1 C3.downregulated C3 downregulated
2 C3.downregulated C3 downregulated
3 C3.downregulated C3 downregulated
4 C3.downregulated C3 downregulated
5 C3.downregulated C3 downregulated
6 C3.downregulated C3 downregulated
Description
1 L13a-mediated translational silencing of Ceruloplasmin expression
2 GTP hydrolysis and joining of the 60S ribosomal subunit
3 Eukaryotic Translation Initiation
4 Cap-dependent Translation Initiation
5 Peptide chain elongation
6 Viral mRNA Translation
GeneRatio BgRatio pvalue p.adjust qvalue
1 10/24 111/10654 1.731717e-14 1.230307e-12 8.478454e-13
2 10/24 112/10654 1.899202e-14 1.230307e-12 8.478454e-13
3 10/24 119/10654 3.540453e-14 1.230307e-12 8.478454e-13
4 10/24 119/10654 3.540453e-14 1.230307e-12 8.478454e-13
5 9/24 89/10654 1.545417e-13 3.580216e-12 2.467245e-12
6 9/24 89/10654 1.545417e-13 3.580216e-12 2.467245e-12
geneID Count
1 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204 10
2 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204 10
3 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204 10
4 6167/6206/6133/6161/6227/6228/6146/6156/1975/6204 10
5 6167/6206/6133/6161/6227/6228/6146/6156/6204 9
6 6167/6206/6133/6161/6227/6228/6146/6156/6204 9
My sessionInfo() is as below:
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /usr/local/lib64/R/lib/libRblas.so
LAPACK: /usr/local/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=ja_JP.UTF-8 LC_NUMERIC=C
[3] LC_TIME=ja_JP.UTF-8 LC_COLLATE=ja_JP.UTF-8
[5] LC_MONETARY=ja_JP.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=ja_JP.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] fgsea_1.14.0 ReactomePA_1.32.0
[3] GO.db_3.11.4 org.Hs.eg.db_3.11.4
[5] AnnotationDbi_1.50.3 IRanges_2.22.2
[7] S4Vectors_0.26.1 Biobase_2.48.0
[9] BiocGenerics_0.34.0 clusterProfiler_3.17.4
[11] forcats_0.5.0 stringr_1.4.0
[13] dplyr_1.0.2 purrr_0.3.4
[15] readr_1.3.1 tidyr_1.1.2
[17] tibble_3.0.3 ggplot2_3.3.2
[19] tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] ggnewscale_0.4.3 ggbeeswarm_0.6.0 colorspace_1.4-1
[4] ggridges_0.5.2 ellipsis_0.3.1 rio_0.5.16
[7] qvalue_2.20.0 fs_1.5.0 rstudioapi_0.11
[10] farver_2.0.3 graphlayouts_0.7.0 ggrepel_0.8.2
[13] bit64_4.0.5 scatterpie_0.1.5 fansi_0.4.1
[16] lubridate_1.7.9 xml2_1.3.2 splines_4.0.2
[19] GOSemSim_2.14.2 polyclip_1.10-0 jsonlite_1.7.1
[22] broom_0.7.0 dbplyr_1.4.4 graph_1.66.0
[25] pheatmap_1.0.12 graphite_1.34.0 ggforce_0.3.2
[28] BiocManager_1.30.10 compiler_4.0.2 httr_1.4.2
[31] rvcheck_0.1.8 backports_1.1.9 assertthat_0.2.1
[34] Matrix_1.2-18 cli_2.0.2 tweenr_1.0.1
[37] tools_4.0.2 igraph_1.2.5 gtable_0.3.0
[40] glue_1.4.2 reshape2_1.4.4 DO.db_2.9
[43] rappdirs_0.3.1 ggthemes_4.2.0 fastmatch_1.1-0
[46] Rcpp_1.0.5 enrichplot_1.9.3 carData_3.0-4
[49] cellranger_1.1.0 vctrs_0.3.4 ggraph_2.0.3
[52] openxlsx_4.1.5 rvest_0.3.6 lifecycle_0.2.0
[55] rstatix_0.6.0 DOSE_3.14.0 MASS_7.3-53
[58] scales_1.1.1 tidygraph_1.2.0 reactome.db_1.70.0
[61] hms_0.5.3 RColorBrewer_1.1-2 yaml_2.2.1
[64] curl_4.3 memoise_1.1.0 gridExtra_2.3
[67] downloader_0.4 stringi_1.5.3 RSQLite_2.2.0
[70] checkmate_2.0.0 zip_2.1.1 BiocParallel_1.22.0
[73] rlang_0.4.7 pkgconfig_2.0.3 lattice_0.20-41
[76] shadowtext_0.0.7 cowplot_1.1.0 bit_4.0.4
[79] tidyselect_1.1.0 plyr_1.8.6 magrittr_1.5
[82] R6_2.4.1 generics_0.0.2 DBI_1.1.0
[85] pillar_1.4.6 haven_2.3.1 foreign_0.8-80
[88] withr_2.2.0 abind_1.4-5 modelr_0.1.8
[91] crayon_1.3.4 car_3.0-9 viridis_0.5.1
[94] grid_4.0.2 readxl_1.3.1 data.table_1.13.0
[97] blob_1.2.1 reprex_0.3.0 digest_0.6.25
[100] gridGraphics_0.5-0 munsell_0.5.0 beeswarm_0.2.3
[103] viridisLite_0.3.0 ggplotify_0.0.5 vipor_0.4.5
A similar post was seen on GitHub but no feedback was given: https://github.com/YuLab-SMU/clusterProfiler/issues/116
I have made sure I'm using the latest version, checked clusterProfiler
vignettes and googled the issue but could not find any possible solutions. Please advice on how to proceed.
Thank you very much!
Thank you very much for the solution. It worked! But I'm curious, why does the "ID" column interfere with the plotting? I don't see any duplicated columns in the data frame...
That's because the compareClusterResult slot of the compareCluster result will have a column of "ID", and the columns of "group1" and "expGrp"( in the input file). If there is an "ID" column in your input file, there will be a conflict.
I see... I didn't realize there's an "ID" column in the result table which is missing in mine as I used the name "ID' for my data. Thank you very much again for your solution!