Question: Convert GDCResults object (nested lists) to a human readable data frame
0
gravatar for user31888
3 months ago by
user3188820
United States
user3188820 wrote:

Is there a way to convert a GDCResults object (i.e. nested lists) obtained with the R package GenomicDataCommons into a data frame?

test sample:

library(GenomicDataCommons)
test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10)

I tried to convert into a data frame using the codes mentioned here and here, but they return a 2 column data frame of hundreds of lines (not very handy). Plus, I lose the column names when converting to a matrix:

df <- as.data.frame(matrix(unlist(test), nrow=length(unlist(test[1]))), stringsAsFactors=F)
ADD COMMENTlink modified 3 months ago • written 3 months ago by user3188820
https://www.rdocumentation.org/packages/GenomicDataCommons/versions/1.3.1/topics/as.data.frame.GDCResults

copy/pasted from webpage:

expands = c("diagnoses","diagnoses.treatments","annotations", "demographic","exposures")
head(cases() %>% expand(expands) %>% results() %>% as.data.frame())
ADD REPLYlink written 3 months ago by cpad01129.4k

No luck.

# Not working with 'as.data.frame()'
> expands = c("diagnoses","diagnoses.treatments","annotations","demographic","exposures")
> head(cases() %>% expand(expands) %>% results() %>% as.data.frame())
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
   arguments imply differing number of rows: 1, 0

# Not working with 'as.data.frame.GDCResults()'
> head(cases() %>% expand(expands) %>% results() %>% as.data.frame.GDCResults())
Error in as.data.frame.GDCResults(.) :
  could not find function "as.data.frame.GDCResults"

# Working without 'as.data.frame()'
> head(cases() %>% expand(expands) %>% results())
ADD REPLYlink modified 3 months ago • written 3 months ago by user3188820

@OP: Try this. tagging the author: Sean Davis

library("GenomicDataCommons")
test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame())

Unfortunately, I am not able to connect to gdc server.

ADD REPLYlink modified 3 months ago • written 3 months ago by cpad01129.4k

I've just reinstalled GenomicDataCommons and all the dependencies. I cannot connect to the server anymore neither.

> source('https://bioconductor.org/biocLite.R')
> biocLite('Bioconductor/GenomicDataCommons')

> library(GenomicDataCommons)

> test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame()
Error in curl::curl_fetch_memory(url, handle = handle) :
  Could not resolve host: gdc-api.nci.nih.gov

> status()
Error in curl::curl_fetch_memory(url, handle = handle) :
  Could not resolve host: gdc-api.nci.nih.gov

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux release 6.2 (Carbon)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicDataCommons_1.2.0 magrittr_1.5             BiocInstaller_1.28.0
[4] RevoUtils_10.0.8         RevoUtilsMath_10.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17           xml2_1.2.0             XVector_0.18.0
 [4] GenomicRanges_1.30.3   BiocGenerics_0.24.0    hms_0.4.2
 [7] zlibbioc_1.24.0        IRanges_2.12.0         R6_2.2.2
[10] rlang_0.2.1            httr_1.3.1             GenomeInfoDb_1.14.0
[13] tools_3.4.3            parallel_3.4.3         data.table_1.11.4
[16] lazyeval_0.2.1         tibble_1.4.2           crayon_1.3.4
[19] GenomeInfoDbData_1.0.0 readr_1.1.1            S4Vectors_0.16.0
[22] bitops_1.0-6           curl_3.2               RCurl_1.95-4.11
[25] pillar_1.3.0           compiler_3.4.3         stats4_3.4.3
[28] jsonlite_1.5           pkgconfig_2.0.1

The version available from Bioconductor (installed on my system) is 1.2.0. The version of GenomicDataCommons describing the as.data.frame.GDCResults function here is 1.3.1. Maybe the function was added recently. Where can we get version 1.3.1 or 1.3.4?

ADD REPLYlink modified 3 months ago • written 3 months ago by user3188820

Installed GenomicDataCommons v1.5.4 on macOS WITHOUT updating dependencies. I can connect to the GDC server. But the function as.data.frame still not working (note that as.data.frame.GDCResults does not seem to exist).

> source('https://bioconductor.org/biocLite.R')
> biocLite('Bioconductor/GenomicDataCommons')

> test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame()
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  arguments imply differing number of rows: 2, 4, 5, 3

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicDataCommons_1.5.4 magrittr_1.5             BiocInstaller_1.28.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17               compiler_3.4.4             pillar_1.2.3               git2r_0.21.0               GenomeInfoDb_1.14.0
 [6] XVector_0.18.0             bindr_0.1.1                bitops_1.0-6               tools_3.4.4                zlibbioc_1.24.0
[11] digest_0.6.15              jsonlite_1.5               memoise_1.1.0              tibble_1.4.2               lattice_0.20-35
[16] pkgconfig_2.0.1            rlang_0.2.1                Matrix_1.2-14              DelayedArray_0.4.1         curl_3.2
[21] parallel_3.4.4             bindrcpp_0.2.2             GenomeInfoDbData_1.0.0     xml2_1.2.0                 withr_2.1.2
[26] httr_1.3.1                 dplyr_0.7.6                knitr_1.20                 hms_0.4.2                  rappdirs_0.3.1
[31] S4Vectors_0.16.0           IRanges_2.12.0             devtools_1.13.5            stats4_3.4.4               grid_3.4.4
[36] tidyselect_0.2.4           glue_1.3.0                 Biobase_2.38.0             R6_2.2.2                   tcltk_3.4.4
[41] readr_1.1.1                purrr_0.2.5                matrixStats_0.53.1         BiocGenerics_0.24.0        GenomicRanges_1.30.3
[46] assertthat_0.2.0           SummarizedExperiment_1.8.1 lazyeval_0.2.1             RCurl_1.95-4.10
ADD REPLYlink modified 3 months ago • written 3 months ago by user3188820

GenomicDataCommons v1.2.0 has as.data.frame.GDCResults function. Try help(package = GenomicDataCommons) to see the functions. I think there is basic functionality issue. Try GenomicDataCommons::status()

ADD REPLYlink written 3 months ago by cpad01129.4k

Correct. as.data.frame.GDCResults appears in the v1.2.0 and 1.5.4 helpers. But still:

> test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame.GDCResults()
Error in as.data.frame.GDCResults(.) :
  could not find function "as.data.frame.GDCResults"

Also tried on Linux (v1.2.0 new install + dependencies update):

> GenomicDataCommons::status()
Error in curl::curl_fetch_memory(url, handle = handle) :
  Could not resolve host: gdc-api.nci.nih.gov

On macOS (v.1.5.4 new install without dependencies update):

> GenomicDataCommons::status()
$commit
[1] "e9e20d6f97f2bf6dd3b3261e36ead57c56a4c7cc"

$data_release
[1] "Data Release 12.0 - June 13, 2018"

$status
[1] "OK"

$tag
[1] "1.14.1"

$version
[1] 1
ADD REPLYlink modified 3 months ago • written 3 months ago by user3188820
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 751 users visited in the last hour