I want to perform Gene Set Enrichment Analysis on Glioblastoma Multiforme dataset in TCGA using GO or KEGG pathway. For this purpose I downloaded data from TCGA using this code. `
library(TCGAbiolinks) query <- GDCquery(project = "TCGA-GBM", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification") GDCdownload(query)
This downloaded the data according to the given parameters but when I tried to prepare this query using the command given below :`
data <- GDCprepare(query)
Then following error came Unable to prepare query there are duplicates in the data. I tried to remove duplicates using fdupes but the software found no duplicate files in the data sets.
So regarding this I have following questions,
How this error can be removed.?
For doing enrichment analysis do I need datasets from all workflows i.e HTseq_counts, HTseq_FPKM and HTseq_FPKM_UQ or any one or two from these can suffice?
- Getting the data what are the next steps to perform the enrichment analysis using GO or KEGG pathway ?