Question: How to perform Gene Set Enrichment Analysis
0
gravatar for Mo
5.1 years ago by
Mo920
/
Mo920 wrote:

Dear all,

I have three cells with many genes (for example one of my data is as follows)

 Gene Name     Drug1         Drug2       Drug3

1007_s_at         -0.2815    -0.2032    -0.2539

1053_at              -0.0113    0.0285    -0.0675

117_at                 -0.0448    -0.136    -0.2189

121_at                  -0.081    0.1412    0.0464

Based on my search I found I should obtain t-test, then p-value etc. There are many functions in r as well as Java which can be used for GSEA. However, I am stuck at the first step , how to prepare the data set and then how to analysis them? I don't mind to analysis with any software available , just can you please one of you help me how to do it ?

I am looking forward to hearing from you

 

 

 

ADD COMMENTlink modified 5.1 years ago by RamRS25k • written 5.1 years ago by Mo920

So, you are working with arabidopsis, use affymetrix gene chips, and have no replications, right?

There is no way to get p-values for gene ranking, I think the most important 'pre-processing' step is to get the raw data with biological replicates.

ADD REPLYlink written 5.1 years ago by Michael Dondrup47k

No this is not the arabdopsis. I dont have replication for a cell but the same drug and the same gene coming from three different cells. 

ADD REPLYlink written 5.1 years ago by Mo920
1

So you have biological replicates, that is contradictory to you example, please be more precise with examples. You are not giving enough details, what is your organism then? this is important because that determines where to  look for GO annotation.

ADD REPLYlink written 5.1 years ago by Michael Dondrup47k

I have the info for three cells of animal liver. I only gave an example matrix to show how they look like. One can then imagine I have three matrix the same as above example

ADD REPLYlink written 5.1 years ago by Mo920
2
gravatar for al3n70rn
5.1 years ago by
al3n70rn100
France
al3n70rn100 wrote:

Have a look to GSEA documentation, is really straightforward:

http://www.broadinstitute.org/gsea/doc/desktop_tutorial.jsp

http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats

ADD COMMENTlink written 5.1 years ago by al3n70rn100
1
gravatar for dago
5.1 years ago by
dago2.6k
Germany
dago2.6k wrote:

The format of your data really depends from the program you are going to use.If you want to perform a GSA using GO you need to:

  1. annotate your gene with GO terms
  2. define a testing group, which I guess is the list of gene your are referring to, and a background group, which you use as "comparison" term

Look previous posts for more details, the procedure us the same for different organisms:

A: How can I do GO enrichment analysis for bacteria genome? (biomaRt is not support

Then below, few papers on the topic:

http://www.pnas.org/content/102/43/15545.short

http://www.biomedcentral.com/1471-2105/6/144.

http://www.biomedcentral.com/1471-2105/10/47

 

 

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by dago2.6k

Thanks for your answer. I definitely agree that each software/package needs a specific way of data structure and of course uses different strategy than that of another one. However, here my question is that lets say a cell with many gene type in a cell (as shown above) and I want to perform the GSEA. I don't want to make a test group and a background group (clasify them myself) because I don't have any clue which genes are significantly differentiate from another one for a given biological activity/ question. Here, I say, I don't know anything about any specific gene and how to perform such analysis ? (some might say lets get the mean for each column (drug) then based on that perform such analysis! I don't know I am looking to find what people think and how I can perform such analysis on this example set? 

ADD REPLYlink written 5.1 years ago by Mo920
1

Well, I do not quite get what you want. If you do not anything about your genes, you can just look for co-expression patterns. I would say you could look to correlations in the Drug_groups. Otherwise you could look for significant differences of expression between groups.

ADD REPLYlink written 5.1 years ago by dago2.6k

I think both correlation and significant differences of expression between groups would make some sense and good to practice. can you please tell me how to do it ? simply perform a correlation coefficient ? 

ADD REPLYlink written 5.1 years ago by Mo920
1
gravatar for Michael Dondrup
5.1 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

I think the difficulty is you didn't really think that through, and you seem to be lacking an experimental question, if you in fact have a certain question to your data, then you should also make it explicit. It is not a good idea to pick an analysis approach first and make the question and data fit somehow, this is sort of the 'opposite' of a scientific method in my opinion.

Gene Set enrichment analysis needs gene sets, well that's obvious, but it is hard to define sensible sets without an experimental question. If you don't have any good hypothesis, than you might try GO term enrichment test instead of GSEA. 

GSEA assumes that the genes can be ordered by a value (low to high), differential expression values might be used for this. An interesting contrast could be drug ~ control, and test if a set of known cancer-associated genes is enriched (this is just an example). Remember that in case you cannot come up with any sensible gene set, then GSEA is not suitable. 

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Michael Dondrup47k

Thanks for your comment. For sure I do have a question in my strategy. My question is to see which genes are up/down regulated 

However, If I am going to give something like 0000 1111 then what is the point to do such analysis ? if I know this in advance then why one need to perform analysis? I want to see whether I can find this discrimination based on available data set or not?

 

ADD REPLYlink written 5.1 years ago by Mo920
1

So you want to do differential expression analysis -> look at the limma Bioconductor package

and GSEA doesn't apply here. 

ADD REPLYlink written 5.1 years ago by Michael Dondrup47k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 770 users visited in the last hour