Gene enrichment analysis with a simple gene list compared to microarray data
3
1
Entering edit mode
9.2 years ago
nash.claire ▴ 490

Hi,

I wonder if anyone can help. I have a list of candidate genes from a previous proteomics experiment (so just a simple list of genes) that I'd like to see if they are enriched in some publicly available gene expression microarray data sets. Is there a bioinformatics tool out there that will allow me to compare just a simple gene list to microarray data and still get some statistics back?

If not, can anyone suggest how I might go about this analysis in a different way? Is it possible to do some sort of correlation analysis when I'm essentially comparing 1 list of gene names to another??

I have tried NetVenn but this requires me to input 2 gene lists and then compare to gene expression microarray data not 1 gene list.

I really look forward to hearing back!

gene genome • 3.4k views
ADD COMMENT
1
Entering edit mode

Hi,

Thank you both very much for your advice. I will perhaps give iPathway a try. Is this free software or subscription only??

ADD REPLY
0
Entering edit mode

It is 100% free to use. You can upload as much data as you wish. Results are available for 72 hours at which point you can purchase the report to keep long term. You can either purchase a single report or you can purchase a subscription. The point is, you can see all of your data for free, then purchase if it makes sense.

If you sign up, let me know, and I'll be happy to give you three free reports to keep. Just mention you learned about it here. This will allow you to get to your comparison for free.

ADD REPLY
1
Entering edit mode
9.2 years ago

You can't really compare a genelist to deposited microarray data in an automated way because the results of these experiments are not stored in a searchable format. Only the original data is.

What you could do is perform an enrichment analysis on your genelist, identify functions of interest and then search the literature for publications that studied this same system. Downloading their results and genelists would perhaps give you something to compare to.

ADD COMMENT
1
Entering edit mode
9.2 years ago
andrew ▴ 560

Please keep in mind that one of the key limitations to any enrichment analysis is that it assumes the variables are independent, but we know that genes are highly dependent on each other in various systems. So you will likely get a number of false positives using any kind of gene set enrichment.

We offer a tool called iPathwayGuide, that will "almost" do what you are looking to do. We still require you to upload two sets of data. Soon, however, we will offer the ability to process publicly available data (e.g. from NCBI-GEO) and then input your list of genes to understand what systems given that phenotype comparison are those genes of interest implicated. That new capability should be out soon.

For now, however, what you can do is find a representative public data set, run it through GEO2R, upload the resulting differential expression data into iPathwayGuide, then reprocess the same data, but artificially make your genes of interest DE by giving them a significant p-value (e.g. 0.01) and all others, an insignificant p-value (e.g. 0.5). The key, is you want to preserve the logFC. The reason for this is one of the key analyses we perform is a perturbation analysis. iPathwayGuide will take the gene expression for your target genes and propagate that perturbation downstream. From this we can identify which pathways are most perturbed. This method virtually eliminates false positives. Then you can compare the two data sets using our meta analysis. This will confirm where any overlap occurs.

Here's a screenshot of the meta analysis for pathways comparing three datasets.

< image not found >

ADD COMMENT
0
Entering edit mode
6.1 years ago
a_liberzon • 0

Try our Investigate Gene Sets tool online at http://software.broadinstitute.org/gsea/msigdb/index.jsp Note that your list should not exceed 2,000 gene (or protein) identifiers, and that the online tool will only show up to 100 most significant results. If you have many lists like that, then I'd recommend downloading our gene sets from MSigDB and implementing the hypergeometric test off line. One way to do that is by calling phyper() function in R for example.

ADD COMMENT

Login before adding your answer.

Traffic: 2985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6