Question: Download all cases from TCGAbiolinks
0
gravatar for aksam
3 days ago by
aksam0
London
aksam0 wrote:

Hi all, I would like to download the bulk RNA-seq data for all patients in the TCGA-LUAD cohort using TCGAbiolinks. Does this exist as a single matrix?

I have read the package vignette and can download individual cases however does TCGAbiolinks facilitate downloading a single matrix of all the patients?

I ask because if you download similar data from Xena browser you can download a 585 column matrix.

I tried this with TCGAbiolinks:

test<-GDCquery(project = 'TCGA-LUAD', data.category = 'Gene expression', data.type = 'Gene expression quantification', platform = "Illumina HiSeq", file.type='results', legacy = TRUE)
dim(getResults(test))

This results in 600 files.

I tried the code below to see if one file was much bigger than the others but it appears not, hence all 600 files are separate cases:

getResults(test) %>% arrange(desc(file_size)) %>% head(10)

Finally I interrogated the duplicated cases and while some cases have a file for both cancer and normal tissue (this is OK), other patients have 2 or 3 files all for cancer tissue. Which file should I choose?!

dups_index <- which(duplicated(getResults(test)[,"cases.submitter_id"]))
dups <- getResults(test)[,"cases.submitter_id"][dups_index]

for(i in 1:length(dups)){
    print(i)
    print(getResults(test) %>% filter(cases.submitter_id == dups[i]) %>% select(sample_type))
}

Any help appreciated, thanks in advance

rna-seq R • 56 views
ADD COMMENTlink modified 3 days ago by RamRS30k • written 3 days ago by aksam0
0
gravatar for Hamid Ghaedi
3 days ago by
Hamid Ghaedi450
Canada
Hamid Ghaedi450 wrote:

Yes, it would provide you a matrix. try this:

library("TCGAbiolinks") # bioconductor package
query_TCGA = GDCquery(
  project = "TCGA-LUAD",
  data.category = "Transcriptome Profiling", # parameter enforced by GDCquery
  experimental.strategy = "RNA-Seq",
  workflow.type = "HTSeq - Counts")

GDCdownload(query = query_TCGA)

dat <- GDCprepare(query = query_TCGA, save = TRUE, save.filename = "exp.rda")


# exp matrix
rna <- as.data.frame(SummarizedExperiment::assay(dat))
ADD COMMENTlink modified 3 days ago • written 3 days ago by Hamid Ghaedi450
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1765 users visited in the last hour