Question: Convert GDCResults object (nested lists) to a human readable data frame
0
gravatar for user31888
2.4 years ago by
user3188890
United States
user3188890 wrote:

Is there a way to convert a GDCResults object (i.e. nested lists) obtained with the R package GenomicDataCommons into a data frame?

test sample:

library(GenomicDataCommons)
test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10)

I tried to convert into a data frame using the codes mentioned here and here, but they return a 2 column data frame of hundreds of lines (not very handy). Plus, I lose the column names when converting to a matrix:

df <- as.data.frame(matrix(unlist(test), nrow=length(unlist(test[1]))), stringsAsFactors=F)
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by user3188890
https://www.rdocumentation.org/packages/GenomicDataCommons/versions/1.3.1/topics/as.data.frame.GDCResults

copy/pasted from webpage:

expands = c("diagnoses","diagnoses.treatments","annotations", "demographic","exposures")
head(cases() %>% expand(expands) %>% results() %>% as.data.frame())
ADD REPLYlink written 2.4 years ago by cpad011214k

No luck.

# Not working with 'as.data.frame()'
> expands = c("diagnoses","diagnoses.treatments","annotations","demographic","exposures")
> head(cases() %>% expand(expands) %>% results() %>% as.data.frame())
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
   arguments imply differing number of rows: 1, 0

# Not working with 'as.data.frame.GDCResults()'
> head(cases() %>% expand(expands) %>% results() %>% as.data.frame.GDCResults())
Error in as.data.frame.GDCResults(.) :
  could not find function "as.data.frame.GDCResults"

# Working without 'as.data.frame()'
> head(cases() %>% expand(expands) %>% results())
ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by user3188890

@OP: Try this. tagging the author: Sean Davis

library("GenomicDataCommons")
test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame())

Unfortunately, I am not able to connect to gdc server.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by cpad011214k

I've just reinstalled GenomicDataCommons and all the dependencies. I cannot connect to the server anymore neither.

> source('https://bioconductor.org/biocLite.R')
> biocLite('Bioconductor/GenomicDataCommons')

> library(GenomicDataCommons)

> test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame()
Error in curl::curl_fetch_memory(url, handle = handle) :
  Could not resolve host: gdc-api.nci.nih.gov

> status()
Error in curl::curl_fetch_memory(url, handle = handle) :
  Could not resolve host: gdc-api.nci.nih.gov

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux release 6.2 (Carbon)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicDataCommons_1.2.0 magrittr_1.5             BiocInstaller_1.28.0
[4] RevoUtils_10.0.8         RevoUtilsMath_10.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17           xml2_1.2.0             XVector_0.18.0
 [4] GenomicRanges_1.30.3   BiocGenerics_0.24.0    hms_0.4.2
 [7] zlibbioc_1.24.0        IRanges_2.12.0         R6_2.2.2
[10] rlang_0.2.1            httr_1.3.1             GenomeInfoDb_1.14.0
[13] tools_3.4.3            parallel_3.4.3         data.table_1.11.4
[16] lazyeval_0.2.1         tibble_1.4.2           crayon_1.3.4
[19] GenomeInfoDbData_1.0.0 readr_1.1.1            S4Vectors_0.16.0
[22] bitops_1.0-6           curl_3.2               RCurl_1.95-4.11
[25] pillar_1.3.0           compiler_3.4.3         stats4_3.4.3
[28] jsonlite_1.5           pkgconfig_2.0.1

The version available from Bioconductor (installed on my system) is 1.2.0. The version of GenomicDataCommons describing the as.data.frame.GDCResults function here is 1.3.1. Maybe the function was added recently. Where can we get version 1.3.1 or 1.3.4?

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by user3188890

Installed GenomicDataCommons v1.5.4 on macOS WITHOUT updating dependencies. I can connect to the GDC server. But the function as.data.frame still not working (note that as.data.frame.GDCResults does not seem to exist).

> source('https://bioconductor.org/biocLite.R')
> biocLite('Bioconductor/GenomicDataCommons')

> test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame()
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  arguments imply differing number of rows: 2, 4, 5, 3

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicDataCommons_1.5.4 magrittr_1.5             BiocInstaller_1.28.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17               compiler_3.4.4             pillar_1.2.3               git2r_0.21.0               GenomeInfoDb_1.14.0
 [6] XVector_0.18.0             bindr_0.1.1                bitops_1.0-6               tools_3.4.4                zlibbioc_1.24.0
[11] digest_0.6.15              jsonlite_1.5               memoise_1.1.0              tibble_1.4.2               lattice_0.20-35
[16] pkgconfig_2.0.1            rlang_0.2.1                Matrix_1.2-14              DelayedArray_0.4.1         curl_3.2
[21] parallel_3.4.4             bindrcpp_0.2.2             GenomeInfoDbData_1.0.0     xml2_1.2.0                 withr_2.1.2
[26] httr_1.3.1                 dplyr_0.7.6                knitr_1.20                 hms_0.4.2                  rappdirs_0.3.1
[31] S4Vectors_0.16.0           IRanges_2.12.0             devtools_1.13.5            stats4_3.4.4               grid_3.4.4
[36] tidyselect_0.2.4           glue_1.3.0                 Biobase_2.38.0             R6_2.2.2                   tcltk_3.4.4
[41] readr_1.1.1                purrr_0.2.5                matrixStats_0.53.1         BiocGenerics_0.24.0        GenomicRanges_1.30.3
[46] assertthat_0.2.0           SummarizedExperiment_1.8.1 lazyeval_0.2.1             RCurl_1.95-4.10
ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by user3188890

GenomicDataCommons v1.2.0 has as.data.frame.GDCResults function. Try help(package = GenomicDataCommons) to see the functions. I think there is basic functionality issue. Try GenomicDataCommons::status()

ADD REPLYlink written 2.4 years ago by cpad011214k

Correct. as.data.frame.GDCResults appears in the v1.2.0 and 1.5.4 helpers. But still:

> test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame.GDCResults()
Error in as.data.frame.GDCResults(.) :
  could not find function "as.data.frame.GDCResults"

Also tried on Linux (v1.2.0 new install + dependencies update):

> GenomicDataCommons::status()
Error in curl::curl_fetch_memory(url, handle = handle) :
  Could not resolve host: gdc-api.nci.nih.gov

On macOS (v.1.5.4 new install without dependencies update):

> GenomicDataCommons::status()
$commit
[1] "e9e20d6f97f2bf6dd3b3261e36ead57c56a4c7cc"

$data_release
[1] "Data Release 12.0 - June 13, 2018"

$status
[1] "OK"

$tag
[1] "1.14.1"

$version
[1] 1
ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by user3188890
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1116 users visited in the last hour