Question: GSEA with TCGA data
0
gravatar for Mike
3.4 years ago by
Mike1.3k
UK
Mike1.3k wrote:

Hello all,

I have TCGA data of EGFR gene (two column file: sample_ID and expression_value) and I want to implement gene set enrichment analysis using GSEA. How can I use this input file in GSEA tool to see the enrichment in different genesets.

Thanks

genome • 2.8k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Mike1.3k

Perhaps I am misunderstanding, but if you have data on only one gene, you will not be able to do gene set enrichment analysis. Could you clarify what you want to do and what data you have?

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Sean Davis25k

Actually I have expression data of EGFR from TCGA, I divided this data into two class "Low" and "High" on the basis of expression value, Now I want to see the gene set enrichment analysis of EGFR in low vs high.

I have following two files:

file 1: exp.gct

1.2

1 400

NAME TCGAsample1 TCGAsample2....... ..sample400

EGFR 0.7859 7.3675 8.0040 ......

file 2: exp.cls

400 2 1

low high

low low low.....

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Mike1.3k
1
gravatar for Sean Davis
3.4 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

Gene set analysis needs to have a set of genes. It isn't possible to perform gene set analysis on the expression data of only one gene. The typical process is to have two groups of samples (could be your EGFR HI/LOW groups), perform differential expression on ALL genes, and then do Gene Set Analysis on the resulting differentially-expressed genes.

ADD COMMENTlink written 3.4 years ago by Sean Davis25k
1
gravatar for TriS
3.4 years ago by
TriS3.9k
United States, Buffalo
TriS3.9k wrote:

if I get your question right...you divided the patients in EGFR high group and EGFR low group. this means that you have expression levels for ALL genes in these two groups. you are now trying to use GSEA to evaluate the enrichment of some signatures (i.e. HALLMARK or KEGG or whatever) and see if there is a difference between the two groups that you created. correct?

if that's the case, yes you can/could, but I think there are a couple of caveats. GSEA itself was designed for microarray data while you have RNASeq data (I'd guess). you can normalize/analyze your data for GSEA as described here. there is also a Bioconductor package called SeqGSEA that might be closer to what you look for.

personally I think that if you normalize and transform your data correctly (and don't use FPKM) you should be fine using those data as input for GSEA.

hope this helps

ADD COMMENTlink written 3.4 years ago by TriS3.9k
0
gravatar for Mike
3.4 years ago by
Mike1.3k
UK
Mike1.3k wrote:

Thanks Sean,

sorry I forgot to mention that I have third file also (C2: curated gene sets downloaded from msigdb)

But how can I perform differential expression for all gene basis of EGFR HI/LOW.

Is it possible or not?

Thanks

ADD COMMENTlink written 3.4 years ago by Mike1.3k
0
gravatar for Mike
3.4 years ago by
Mike1.3k
UK
Mike1.3k wrote:

Thanks TriS,

Yes absolutely you got my question, my data is mRNA Expression z-Scores (RNA Seq V2 RSEM), and I divided tumor samples based on high EGFR and Low EGFR, BUT I not included all genes, I have only EGFR gene. Is it possibe? or I should include all genes.

Thanks,

ADD COMMENTlink written 3.4 years ago by Mike1.3k
1

first things first. use the add comment function.link when replying, unless you are actually replying to the main question :)

if for EGFR genes you mean genes that are involved in the EGFR pathway then you don't need to do any functional enrichment analysis because you already know your genes are involved in the EGFR pathway (ok, maybe a few more too). the (very) general point of something like GSEA is to understand what the genes that change the most do and in which direction the pathway goes. this means that you start from a genomewide experiment, not from a handful of genes. therefore no, I wouldn't use GSEA only on the EGFR genes.

ADD REPLYlink written 3.4 years ago by TriS3.9k

Thank you so much... TriS, Yes you are right .

So first I should include all genes and divide samples based on high/low EGFR, then use GSEA.

ADD REPLYlink written 3.4 years ago by Mike1.3k
1

That is probably the way to go, yes. As I mentioned, there will need to be a differential expression test to get a ranked list of genes.

ADD REPLYlink written 3.4 years ago by Sean Davis25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1817 users visited in the last hour