Problem in using TCGAvisualize_meanMethylation function
1
0
Entering edit mode
6.1 years ago

Hi, I used ‘TCGAbiolinks’ version 2.6.12. I have just tried to plot using TCGAvisualize_meanMethylation function, using flowing codes, but the R script didn't work well and flowing error came out:

R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> #----------------------------
> # Obtaining DNA methylation
> #----------------------------
> setwd("F:/worlshop_EWAS/New/TCGA/Matched_ex_met")
> library(TCGAbiolinks)
> library(stringr)
> matched_met_exp <- function(project, n = NULL){
+ # get primary solid tumor samples: DNA methylation
+ message("Download DNA methylation information")
+ met450k <- GDCquery(project = project,
+                     data.category = "DNA methylation",
+                     platform = "Illumina Human Methylation 450",
+                     legacy = TRUE,
+                     sample.type = c("Primary solid Tumor"))
+ met450k.tp <- met450k$results[[1]]$cases
+ # get primary solid tumor samples: RNAseq
+ message("Download gene expression information")
+ exp <- GDCquery(project = project,
+                 data.category = "Gene expression",
+                 data.type = "Gene expression quantification",
+                 platform = "Illumina HiSeq",
+                 file.type = "results",
+                 sample.type = c("Primary solid Tumor"),
+ legacy = TRUE)
+ exp.tp <- exp$results[[1]]$cases
+ printexp.tp[1:10])
+ # Get patients with samples in both platforms
+ patients <- unique(substrexp.tp,1,15)[substrexp.tp,1,12) %in% substrmet450k.tp,1,12)])
+ if(!is.null(n)) patients <- patients[1:n] # get only n samples
+ return(patients)
+ }
> lgg.samples <- matched_met_exp("TCGA-LGG", n = 10)
Download DNA methylation information
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-LGG
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By sample.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Download gene expression information
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-LGG
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By data.type
ooo By file.type
ooo By sample.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
 [1] "TCGA-HT-7480-01A-11R-2090-07" "TCGA-QH-A6XC-01A-12R-A32Q-07"
 [3] "TCGA-HT-A614-01A-11R-A29R-07" "TCGA-HT-8104-01A-11R-2404-07"
 [5] "TCGA-TQ-A7RG-01A-11R-A33Z-07" "TCGA-HW-7487-01A-11R-2027-07"
 [7] "TCGA-FG-8186-01A-11R-2256-07" "TCGA-P5-A5EX-01A-12R-A28M-07"
 [9] "TCGA-CS-6667-01A-12R-2027-07" "TCGA-DU-7304-01A-12R-2090-07"
> gbm.samples <- matched_met_exp("TCGA-GBM", n = 10)
Download DNA methylation information
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-GBM
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By sample.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Download gene expression information
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-GBM
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By data.type
ooo By file.type
ooo By sample.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
 [1] "TCGA-06-0184-01A-01R-1849-01" "TCGA-06-0649-01B-01R-1849-01"
 [3] "TCGA-02-2485-01A-01R-1849-01" "TCGA-26-5136-01B-01R-1850-01"
 [5] "TCGA-76-4925-01A-01R-1850-01" "TCGA-32-2632-01A-01R-1850-01"
 [7] "TCGA-06-5858-01A-01R-1849-01" "TCGA-28-5213-01A-01R-1850-01"
 [9] "TCGA-06-0157-01A-01R-1849-01" "TCGA-16-0846-01A-01R-1850-01"
> samples <- c(lgg.samples,gbm.samples)
> query.lgg <- GDCquery(project = "TCGA-LGG",
+                       data.category = "DNA methylation",
+                       platform = "Illumina Human Methylation 450",
+                       legacy = TRUE, barcode = lgg.samples)
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-LGG
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By barcode
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
> met.lgg <-GDCprepare(query.lgg, save = FALSE)
  |====================================================================================| 100%Downloading genome information (try:0) Using: Homo sapiens genes (GRCh37.p13)
Loading from disk
Starting to add information to samples
 => Add clinical information to samples
Add FFPE information. More information at: 
=> https://cancergenome.nih.gov/cancersselected/biospeccriteria 
=> http://gdac.broadinstitute.org/runs/sampleReports/latest/FPPP_FFPE_Cases.html
 => Adding subtype information to samples
lgg subtype information from:doi:10.1016/j.cell.2015.12.028
> query.gbm <- GDCquery(project = "TCGA-GBM",
+                       data.category = "DNA methylation",
+                       platform = "Illumina Human Methylation 450",
+                       legacy = TRUE, barcode = gbm.samples)
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-GBM
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By barcode
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
> met.gbm <- GDCprepare(query.gbm, save = FALSE)
  |====================================================================================| 100%Downloading genome information (try:0) Using: Homo sapiens genes (GRCh37.p13)
Loading from disk
Starting to add information to samples
 => Add clinical information to samples
Add FFPE information. More information at: 
=> https://cancergenome.nih.gov/cancersselected/biospeccriteria 
=> http://gdac.broadinstitute.org/runs/sampleReports/latest/FPPP_FFPE_Cases.html
 => Adding subtype information to samples
gbm subtype information from:doi:10.1016/j.cell.2015.12.028
> met <- SummarizedExperiment::cbind(met.lgg, met.gbm)
> met <- subset(met,subset = rowSumsis.na(assay(met))) == 0))
Error in assay(met) : could not find function "assay"
> # remove probes in chromossomes X, Y and NA
> met <- subset(met,subset = !as.character(seqnames(met)) %in% c("chrNA","chrX","chrY"))
Error in seqnames(met) : could not find function "seqnames"
> TCGAvisualize_meanMethylation(met,
+                                  groupCol = "disease_type",
+                                  group.legend = "Groups",
+                                  filename = "mean_lgg_gbm.png",
+                                  print.pvalue = TRUE)
==================== DATA Summary ====================
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 
  'x' must be atomic

if you know the solution, would you tell me , thanks in advance !

R • 2.5k views
ADD COMMENT
0
Entering edit mode

Thank you. Unfortunately, this is an error for the developers of the program. However, they do not seem to be doing any further work on TCGAbiolinks. Take a look at the GitHub issues page: https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues

You could obtain the methylation data from GDC direct, and analyse it that way.

ADD REPLY
0
Entering edit mode

Ok, I need use other packages. Do you have other workflow files to analyze methylation? Thank you.

ADD REPLY
0
Entering edit mode

Hi, friend. I solve this problem. By changing the last line of code: TCGAvisualize_meanMethylation(met, groupCol = "name", group.legend = "Groups", filename = "mean_lgg_gbm.png", print.pvalue = TRUE)

The picture obtained seems to be the same. I modified the parameter:groupCol,because that I was inspired by the results of the code:

str(met)
.. .. .. ..$ name : chr [1:20] "Brain Lower Grade Glioma" "Brain Lower Grade Glioma" "Brain Lower Grade Glioma" "Brain Lower Grade Glioma" ...

Do you think that it is feasible to do so?

ADD REPLY
0
Entering edit mode

The only question is that I get a different p-value. I am confused.

ADD REPLY
0
Entering edit mode

I encourage you to contact the TCGAbiolinks developers. The functions that they provide are very specific and not many have used them.

ADD REPLY
0
Entering edit mode

Ok, thank you. I will try.

ADD REPLY
3
Entering edit mode
6.1 years ago

Just add these to your code:

library(SummarizedExperiment)
library(GenomicRanges)

They are already installed on your R version but you do not explicitly load them.

ADD COMMENT
0
Entering edit mode

it still can't solve this problem

ADD REPLY
0
Entering edit mode

You are not the person who opened this question; thus, your problem is likely different. Which error message are you receiving?

ADD REPLY
0
Entering edit mode

Hi,I can't solve the same problem by your method.

[R.app GUI 1.70 (7543) x86_64-apple-darwin15.6.0]

[History restored from /Users/chaisongshan/.Rapp.history]

library(TCGAbiolinks)
library(stringr)
library(SummarizedExperiment)
载入需要的程辑包:GenomicRanges
载入需要的程辑包:stats4
载入需要的程辑包:BiocGenerics
载入需要的程辑包:parallel

载入程辑包:‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
    ...

载入需要的程辑包:S4Vectors

载入程辑包:‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

载入需要的程辑包:IRanges
载入需要的程辑包:GenomeInfoDb
载入需要的程辑包:Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

载入需要的程辑包:DelayedArray
载入需要的程辑包:matrixStats

载入程辑包:‘matrixStats’

The following objects are masked from ‘package:Biobase’:

    anyMissing, rowMedians

载入需要的程辑包:BiocParallel

载入程辑包:‘DelayedArray’

The following objects are masked from ‘package:matrixStats’:

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following objects are masked from ‘package:base’:

    aperm, apply

library(GenomicRanges)
# Samples
matched_met_exp <- function(project, n = NULL){
 message("Download DNA methylation information")
 met450k <- GDCquery(project = project,
 data.category = "DNA methylation",
 platform = "Illumina Human Methylation 450",
 legacy = TRUE,
 sample.type = c("Primary solid Tumor"))
 met450k.tp <- met450k$results[[1]]$cases

 # get primary solid tumor samples: RNAseq
 message("Download gene expression information")
 exp <- GDCquery(project = project,
 data.category = "Gene expression",
 data.type = "Gene expression quantification",
 platform = "Illumina HiSeq",
 file.type = "results",
 sample.type = c("Primary solid Tumor"),
 legacy = TRUE)
 exp.tp <- exp$results[[1]]$cases
 # Get patients with samples in both platforms
 patients <- uniquesubstrexp.tp,1,15)[substrexp.tp,1,12) %in% substrmet450k.tp,1,12)])
 if(!is.null(n)) patients <- patients[1:n] # get only n samples
 return(patients)
}

lgg.samples <- matched_met_exp("TCGA-LGG", n = 10)
Download DNA methylation information

gbm.samples <- matched_met_exp("TCGA-GBM", n = 10)
Download DNA methylation information

samples <- c(lgg.samples,gbm.samples)

query.lgg <- GDCquery(project = "TCGA-LGG",
+ data.category = "DNA methylation",
+ platform = "Illumina Human Methylation 450",
+ legacy = TRUE, barcode = lgg.samples)

GDCdownload(query.lgg)
Downloading data for project TCGA-LGG
Of the 10 files for download 10 already exist.
All samples have been already downloaded
met.lgg <-GDCprepare(query.lgg, save = FALSE)
  |=====================================================================| 100%Downloading genome information (try:0) Using: Homo sapiens genes (GRCh37.p13)

query.gbm <- GDCquery(project = "TCGA-GBM",
+ data.category = "DNA methylation",
+ platform = "Illumina Human Methylation 450",
+ legacy = TRUE, barcode = gbm.samples)

GDCdownload(query.gbm)
met.gbm <- GDCprepare(query.gbm, save = FALSE)
  |=====================================================================| 100%Downloading genome information (try:0) Using: Homo sapiens genes (GRCh37.p13)

met <- SummarizedExperiment::cbind(met.lgg, met.gbm)

met <- subset(met,subset = rowSumsis.na(SummarizedExperiment::assay(met))) == 0))

met <- subset(met,subset = !as.character(seqnames(met)) %in% c("chrNA","chrX","chrY"))

TCGAvisualize_meanMethylation(met, groupCol = "disease_type", group.legend = "Groups", filename = "mean_lgg_gbm.png", print.pvalue = TRUE)
==================== DATA Summary ====================
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 
  'x' must be atomic
ADD REPLY
0
Entering edit mode

I tidied your code to improve the visualisation of it. From where did you obtain this code? There are many 'bugs' (errors) with TCGAbiolinks.

ADD REPLY
0
Entering edit mode

I have reproduced the error on my computer. From where did you obtain the code?

ADD REPLY

Login before adding your answer.

Traffic: 1768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6