Question: Issues in Gene Set Enrichment Analysis using TCGAbiolinks
0
gravatar for ammarsabir15
2.2 years ago by
ammarsabir1550
ammarsabir1550 wrote:

I want to perform Gene Set Enrichment Analysis on Glioblastoma Multiforme dataset in TCGA using GO or KEGG pathway. For this purpose I downloaded data from TCGA using this code. `

library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-GBM",
                   data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification")

GDCdownload(query)

This downloaded the data according to the given parameters but when I tried to prepare this query using the command given below :`

 data <- GDCprepare(query)

Then following error came Unable to prepare query there are duplicates in the data. I tried to remove duplicates using fdupes but the software found no duplicate files in the data sets.

So regarding this I have following questions,

  • How this error can be removed.?

  • For doing enrichment analysis do I need datasets from all workflows i.e HTseq_counts, HTseq_FPKM and HTseq_FPKM_UQ or any one or two from these can suffice?

  • Getting the data what are the next steps to perform the enrichment analysis using GO or KEGG pathway ?
bioconductor R tcgabiolinks • 906 views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by ammarsabir1550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 931 users visited in the last hour