Question

Issues in Gene Set Enrichment Analysis using TCGAbiolinks

0

Entering edit mode

7.3 years ago

ammarsabir15 ▴ 70

I want to perform Gene Set Enrichment Analysis on Glioblastoma Multiforme dataset in TCGA using GO or KEGG pathway. For this purpose I downloaded data from TCGA using this code. `

library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-GBM",
                   data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification")

GDCdownload(query)

This downloaded the data according to the given parameters but when I tried to prepare this query using the command given below :`

 data <- GDCprepare(query)

Then following error came Unable to prepare query there are duplicates in the data. I tried to remove duplicates using fdupes but the software found no duplicate files in the data sets.

So regarding this I have following questions,

How this error can be removed.?
For doing enrichment analysis do I need datasets from all workflows i.e HTseq_counts, HTseq_FPKM and HTseq_FPKM_UQ or any one or two from these can suffice?
Getting the data what are the next steps to perform the enrichment analysis using GO or KEGG pathway ?

bioconductor TCGAbiolinks R • 2.0k views

ADD COMMENT • link 7.3 years ago by ammarsabir15 ▴ 70