Question: mgeneSim function in GOSemSim package: error: r in go_matrix[gos[[i]], gos[[j]]] : incorrect number of dimensions
0
gravatar for 766706903
8 months ago by
7667069030
7667069030 wrote:

Question

When I run the mgeneSim function in the GOSemsim package, I choose ontology "BP" or "CC", the error pop up as follows. It seems that only these genes with "BP" or "CC" cause problems. ontology term "MF" doesn't cause error, but it returns a value 1 instead of a matrix.

The source code is here

rm(list = ls())
# source("https://bioconductor.org/biocLite.R")
# biocLite("GOSemSim")
library("GOSemSim")
# source("https://bioconductor.org/biocLite.R")
# biocLite("org.Hs.eg.db")
library("org.Hs.eg.db")
# install.packages("rvcheck")
library("rvcheck")
rvcheck::check_bioc("GOSemSim")
rvcheck::check_bioc("org.Hs.eg.db")
R.Version()
sessionInfo()

sim_db <- godata('org.Hs.eg.db', ont="BP", computeIC=TRUE)

an_Entrez_ID_vector <- c("9895", "404636", "54621", "115361")
names(an_Entrez_ID_vector) <- c("TECPR2", "FAM45A", "VSIG10", "GBP4")
an_Entrez_ID_vector

similarity_matrix <- mgeneSim(an_Entrez_ID_vector, semData = sim_db,
                              measure = "Wang", drop = "IEA",
                              combine = "avg", verbose = TRUE)

The running result is here

> rm(list = ls())

> # source("https://bioconductor.org/biocLite.R")
> # biocLite("GOSemSim")
> library("GOSemSim")

> # source("https://bioconductor.org/biocLite.R")
> # biocLite("org.Hs.eg.db")
> library("org.Hs.eg.db")

> # install.packages("rvcheck")
> library("rvcheck")

> rvcheck::check_bioc("GOSemSim")
package is up-to-date release version
$`package`
[1] "GOSemSim"

$installed_version
[1] "2.6.0"

$latest_version
[1] "2.6.0"

$up_to_date
[1] TRUE


> rvcheck::check_bioc("org.Hs.eg.db")
package is up-to-date release version
$`package`
[1] "org.Hs.eg.db"

$installed_version
[1] "3.6.0"

$latest_version
[1] "3.6.0"

$up_to_date
[1] TRUE


> R.Version()
$`platform`
[1] "x86_64-w64-mingw32"

$arch
[1] "x86_64"

$os
[1] "mingw32"

$system
[1] "x86_64, mingw32"

$status
[1] ""

$major
[1] "3"

$minor
[1] "5.1"

$year
[1] "2018"

$month
[1] "07"

$day
[1] "02"

$`svn rev`
[1] "74947"

$language
[1] "R"

$version.string
[1] "R version 3.5.1 (2018-07-02)"

$nickname
[1] "Feather Spray"


> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Hs.eg.db_3.6.0   AnnotationDbi_1.42.1 IRanges_2.14.10      S4Vectors_0.18.3     Biobase_2.40.0      
[6] BiocGenerics_0.26.0  GOSemSim_2.6.0       rvcheck_0.1.0        BiocInstaller_1.30.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18    GO.db_3.6.0     digest_0.6.15   DBI_1.0.0       RSQLite_2.1.1   blob_1.1.1     
 [7] tools_3.5.1     bit64_0.9-7     bit_1.1-14      compiler_3.5.1  pkgconfig_2.0.1 memoise_1.1.0  

> sim_db <- godata('org.Hs.eg.db', ont="BP", computeIC=TRUE)
preparing gene to GO mapping data...
preparing IC data...

> an_Entrez_ID_vector <- c("9895", "404636", "54621", "115361")

> names(an_Entrez_ID_vector) <- c("TECPR2", "FAM45A", "VSIG10", "GBP4")

> an_Entrez_ID_vector
  TECPR2   FAM45A   VSIG10     GBP4 
  "9895" "404636"  "54621" "115361" 

> similarity_matrix <- mgeneSim(an_Entrez_ID_vector, semData = sim_db,
+                               measure = "Wang", drop = "IEA",
+               .... [TRUNCATED] 
  |==========                                                                             |  10%
Error in go_matrix[gos[[i]], gos[[j]]] : incorrect number of dimensions
ADD COMMENTlink modified 8 months ago • written 8 months ago by 7667069030

good.......................

ADD REPLYlink modified 8 months ago • written 8 months ago by 7667069030

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. This comment comment should have gone under @Guangchuang's answer.

ADD REPLYlink written 8 months ago by genomax65k

Okay. Thanks for your suggestions. This is my first time posting here.

ADD REPLYlink written 8 months ago by 7667069030

Do not use the SUBMIT ANSWER window/button to post additional content/comments.

If you are posting from china and are not able to use the ADD REPLY/ADD COMMENT buttons then try switching to chrome browser. That generally works.

ADD REPLYlink written 8 months ago by genomax65k

I don't know how to post to a correct place...

ADD REPLYlink modified 8 months ago • written 8 months ago by 7667069030

posting to a correct place is very hard...

ADD REPLYlink modified 8 months ago • written 8 months ago by 7667069030
0
gravatar for Guangchuang Yu
8 months ago by
Guangchuang Yu2.1k
China/Hong Kong/The University of Hong Kong
Guangchuang Yu2.1k wrote:
> clusterProfiler::bitr(an_Entrez_ID_vector, 'ENTREZID', 'GO', OrgDb='org.Hs.eg.db')  
'select()' returned 1:many mapping between keys and columns
  ENTREZID         GO EVIDENCE ONTOLOGY
1     9895 GO:0005515      IPI       MF
2     9895 GO:0006914      IEA       BP
4    54621 GO:0016021      IEA       CC
5   115361 GO:0000139      IEA       CC
6   115361 GO:0003924      IEA       MF
7   115361 GO:0005525      IEA       MF
8   115361 GO:0005634      IEA       CC
9   115361 GO:0048471      IEA       CC
Warning message:
In clusterProfiler::bitr(an_Entrez_ID_vector, "ENTREZID", "GO",  :
  25% of input gene IDs are fail to map...

this is due to none of your genes can be annotated by GO (if we exclude IEA).

ADD COMMENTlink written 8 months ago by Guangchuang Yu2.1k

Thanks for explanation. I understand now where the issues arise.

If I set drop = NULL, I got a result value for "BP" by calling mgeneSim function, but I expected a matrix. Since there is only one "BP"-related GO term: GO:0006914, why is the similarity value 1, not 0? After all, the only one "BP" GO term can not be similar to anything in the example four-gene cluster.

Source Code

sim_db <- godata('org.Hs.eg.db', ont="BP", computeIC=TRUE)

an_Entrez_ID_vector <- c("9895", "404636", "54621", "115361")
names(an_Entrez_ID_vector) <- c("TECPR2", "FAM45A", "VSIG10", "GBP4")
an_Entrez_ID_vector


similarity_matrix <- mgeneSim(an_Entrez_ID_vector, semData = sim_db,
                              measure = "Wang", drop = NULL,
                              combine = "avg", verbose = TRUE)

similarity_matrix

similarity_score <- combineScores(similarity_matrix, combine = "avg")
similarity_score

Result

> sim_db <- godata('org.Hs.eg.db', ont="BP", computeIC=TRUE)
preparing gene to GO mapping data...
preparing IC data...
> 
> an_Entrez_ID_vector <- c("9895", "404636", "54621", "115361")
> names(an_Entrez_ID_vector) <- c("TECPR2", "FAM45A", "VSIG10", "GBP4")
> an_Entrez_ID_vector
  TECPR2   FAM45A   VSIG10     GBP4 
  "9895" "404636"  "54621" "115361" 
> 
> 
> similarity_matrix <- mgeneSim(an_Entrez_ID_vector, semData = sim_db,
+                               measure = "Wang", drop = NULL,
+                               combine = "avg", verbose = TRUE)
  |===================================================| 100%
> 
> similarity_matrix
[1] 1
> 
> similarity_score <- combineScores(similarity_matrix, combine = "avg")
> similarity_score
[1] 1
ADD REPLYlink modified 8 months ago • written 8 months ago by 7667069030

By design, all the NA columns and rows will be removed.

This is why the output is 1. As you only have 1 GO term and the similarity of a GO term to itself is always 1 using Wang's method.

ADD REPLYlink written 8 months ago by Guangchuang Yu2.1k

Thanks for your explanation.

I have another two genes as an example cluster. One gene has no "BP", the other gene has multiple "BP"s, the similarity calculated is 0.189. Since there is only one gene with "BP" GO terms, the GO similarity of a two-gene cluster is calculated based on the average similarity among these "BP" GO terms of that one gene. The definition of GO similarity score is based on available GO terms in a cluster not based on the gene level in a cluster. Am I right?

Source code

an_Entrez_ID_vector <- c("113177", "3600")
names(an_Entrez_ID_vector) <- c("C19orf36", "IL15")
print(an_Entrez_ID_vector)

clusterProfiler::bitr(an_Entrez_ID_vector, 'ENTREZID', 'GO', OrgDb='org.Hs.eg.db')

similarity_matrix <- mgeneSim(an_Entrez_ID_vector, semData = sim_db,
                              measure = the_measure_method, drop = NULL,
                              combine = "avg", verbose = TRUE)
if (length(similarity_matrix) == 1) {
    print("non-matrix found")
}

similarity_score <- combineScores(similarity_matrix, combine = "avg")

print(similarity_matrix)
print(similarity_score)

Result

> an_Entrez_ID_vector <- c("113177", "3600")
>     names(an_Entrez_ID_vector) <- c("C19orf36", "IL15")
>     print(an_Entrez_ID_vector)
C19orf36     IL15 
"113177"   "3600" 
>     
>     clusterProfiler::bitr(an_Entrez_ID_vector, 'ENTREZID', 'GO', OrgDb='org.Hs.eg.db')
'select()' returned 1:many mapping between keys and columns
   ENTREZID         GO EVIDENCE ONTOLOGY
1    113177 GO:0005576      IEA       CC
2    113177 GO:0005634      HDA       CC
3    113177 GO:0005634      IBA       CC
4      3600 GO:0001779      IEA       BP
5      3600 GO:0001866      IEA       BP
6      3600 GO:0005125      IEA       MF
7      3600 GO:0005126      IEA       MF
8      3600 GO:0005515      IPI       MF
9      3600 GO:0005576      NAS       CC
10     3600 GO:0005576      TAS       CC
11     3600 GO:0005615      IEA       CC
12     3600 GO:0005654      IDA       CC
13     3600 GO:0005737      TAS       CC
14     3600 GO:0005768      TAS       CC
15     3600 GO:0005794      TAS       CC
16     3600 GO:0005829      IDA       CC
17     3600 GO:0005887      TAS       CC
18     3600 GO:0006954      IEA       BP
19     3600 GO:0006955      IEA       BP
20     3600 GO:0007165      TAS       BP
21     3600 GO:0007260      IDA       BP
22     3600 GO:0007267      TAS       BP
23     3600 GO:0007568      IEA       BP
24     3600 GO:0008284      TAS       BP
25     3600 GO:0009986      IEA       CC
26     3600 GO:0010469      IEA       BP
27     3600 GO:0014732      IEA       BP
28     3600 GO:0016020      TAS       CC
29     3600 GO:0016607      IDA       CC
30     3600 GO:0030212      IEA       BP
31     3600 GO:0032740      IDA       BP
32     3600 GO:0032819      IEA       BP
33     3600 GO:0032825      IEA       BP
34     3600 GO:0034105       IC       BP
35     3600 GO:0035723      IDA       BP
36     3600 GO:0035723      TAS       BP
37     3600 GO:0042102      IEA       BP
38     3600 GO:0042531      IEA       BP
39     3600 GO:0045062      IEA       BP
40     3600 GO:0045580      IEA       BP
41     3600 GO:0048469      IEA       BP
42     3600 GO:0048535      IEA       BP
43     3600 GO:0048662      IEA       BP
44     3600 GO:0050691      IEA       BP
45     3600 GO:0050729       IC       BP
46     3600 GO:0050778      IEA       BP
47     3600 GO:0071305      IEA       BP
48     3600 GO:1904100      IEA       BP
>     
>     similarity_matrix <- mgeneSim(an_Entrez_ID_vector, semData = sim_db,
+                                   measure = the_measure_method, drop = NULL,
+                                   combine = "avg", verbose = TRUE)
  |=======================================================| 100%
>     if (length(similarity_matrix) == 1) {
+         print("non-matrix found")
+     }
[1] "non-matrix found"
>     
>     similarity_score <- combineScores(similarity_matrix, combine = "avg")
> 
>     print(similarity_matrix)
[1] 0.189
>     print(similarity_score)
[1] 0.189
ADD REPLYlink modified 8 months ago • written 8 months ago by 7667069030

in GOSemSim v>=2.6.1, mgeneSim will always output a matrix.

> similarity_matrix
      3600
3600 0.189
ADD REPLYlink written 8 months ago by Guangchuang Yu2.1k

always outputing a matrix is good. My question is how to get version 2.6.1? On bioconductor, I can only install 2.6.0.

> source("https://bioconductor.org/biocLite.R")
Bioconductor version 3.7 (BiocInstaller 1.30.0), ?biocLite for help
> biocLite("GOSemSim")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.7 (BiocInstaller 1.30.0), R 3.5.1 (2018-07-02).
Installing package(s) ‘GOSemSim’
trying URL 'https://bioconductor.org/packages/3.7/bioc/bin/windows/contrib/3.5/GOSemSim_2.6.0.zip'
Content type 'application/zip' length 1300137 bytes (1.2 MB)
downloaded 1.2 MB

package ‘GOSemSim’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\littlefairy\AppData\Local\Temp\RtmpOQSwOG\downloaded_packages
Old packages: 'stringi'
Update all/some/none? [a/s/n]: 
a

  There is a binary version available but the source version is later:
        binary source needs_compilation
stringi  1.1.7  1.2.4              TRUE

  Binaries will be installed
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.5/stringi_1.1.7.zip'
Content type 'application/zip' length 14368013 bytes (13.7 MB)
downloaded 13.7 MB

package ‘stringi’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\littlefairy\AppData\Local\Temp\RtmpOQSwOG\downloaded_packages
> library("rvcheck")
> rvcheck::check_bioc("GOSemSim")
package is up-to-date release version
$`package`
[1] "GOSemSim"

$installed_version
[1] "2.6.0"

$latest_version
[1] "2.6.0"

$up_to_date
[1] TRUE
ADD REPLYlink written 8 months ago by 7667069030
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1211 users visited in the last hour