1
0
Entering edit mode
2.3 years ago
aksam ▴ 10

Hi all, I would like to download the bulk RNA-seq data for all patients in the TCGA-LUAD cohort using TCGAbiolinks. Does this exist as a single matrix?

test<-GDCquery(project = 'TCGA-LUAD', data.category = 'Gene expression', data.type = 'Gene expression quantification', platform = "Illumina HiSeq", file.type='results', legacy = TRUE)
dim(getResults(test))


This results in 600 files.

I tried the code below to see if one file was much bigger than the others but it appears not, hence all 600 files are separate cases:

getResults(test) %>% arrange(desc(file_size)) %>% head(10)


Finally I interrogated the duplicated cases and while some cases have a file for both cancer and normal tissue (this is OK), other patients have 2 or 3 files all for cancer tissue. Which file should I choose?!

dups_index <- which(duplicated(getResults(test)[,"cases.submitter_id"]))
dups <- getResults(test)[,"cases.submitter_id"][dups_index]

for(i in 1:length(dups)){
print(i)
print(getResults(test) %>% filter(cases.submitter_id == dups[i]) %>% select(sample_type))
}


Any help appreciated, thanks in advance

RNA-Seq R • 2.3k views
0
Entering edit mode

Thanks, I managed to download the whole matrix using this. There are still duplicated entries (e.g. more than two tumour samples for the same patient) with no obvious rationale for which to delete, but at least I have the whole matrix now - thanks

(apologies this should be a reply to the answer above but can't seem to get this to work)

0
Entering edit mode

Are you not able to use ADD COMMENT button?

0
Entering edit mode

Hi, yes seems to be working now - thanks

2
Entering edit mode
2.3 years ago

Yes, it would provide you a matrix. try this:

library("TCGAbiolinks") # bioconductor package
query_TCGA = GDCquery(
data.category = "Transcriptome Profiling", # parameter enforced by GDCquery
experimental.strategy = "RNA-Seq",
workflow.type = "HTSeq - Counts")

dat <- GDCprepare(query = query_TCGA, save = TRUE, save.filename = "exp.rda")

# exp matrix
rna <- as.data.frame(SummarizedExperiment::assay(dat))

0
Entering edit mode

I use this code to download data but i get unicode issue like in pictures here

How can i solve this problem?

0
Entering edit mode

Please paste your code here and let other people see what you have tried. So they might be able to help with issue.