Question: Functional Enrichment With Large Numbers Of Genes
gravatar for Rubal7
8.3 years ago by
Rubal7770 wrote:

Hello all,

I have a large list of genes that come from genomic regions identified using a population genetic test statistic on genome-wide sequence data (not expression data). I would like to see if this list of genes contains enrichment for any particular biological categories. However the list of genes is vary large - hundreds of genes, and it seems that DAVID and Panther are unable to handle such a large list. Does anyone know of gene list enrichment software that is not constrained to a limited number of genes (as far as people are aware).

Thanks in advance,


ADD COMMENTlink written 8.3 years ago by Rubal7770
gravatar for SES
8.3 years ago by
Vancouver, BC
SES8.4k wrote:

I have used Ontologizer on large data sets without any problems. Also, you might want to try searching through the archives of this site (if you haven't already). I know that is not specific, but GO-related questions come up frequently and you might be able to dig up something useful. Good luck.

ADD COMMENTlink modified 8.3 years ago • written 8.3 years ago by SES8.4k
gravatar for tiagoantao
8.3 years ago by
United States of America
tiagoantao670 wrote:

I am going to take a leap of faith here and imagine that your statistic is something like Fst, iHS or xpEHH. If that is the case then you will have a set of genes around your statistical areas of interest (most statistical tests do not have spacial precision to pinpoint a gene, but only a window). This means that you will have gene clustering around your statistic. Therefore you might have several genes with similar GO terms in the search area therefore inflating that GO term.

All this to say that you might have to do your analysis window based (typically 200kb windows with humans) and not gene based. Most GO tools are not made with pop gen statistics in mind and are gene based (as you know by now).

This might not be your case, but if you are using standard pop gen stuff, you might have to add an extra layer of analysis.

I am aware that this is the opposite of an answer: I am raising yet another problem. But if you are doing standard pop gen stuff you will have to deal with lack of statistical precision in spacial terms and use window based approaches instead of gene based approaches.

I suggest reading Grossman & Sabeti paper on science for an idea of the problem of spacial precision with pop gen (selection in the case) stats. Please note that I am not suggesting to use their solution (just a useful read to the problem of precision).

There are papers doing GO analysis with pop gen status and window approaches. I do not have any here, but you can search for them... Window based approaches (not GO) are well represented in Pickrell et al "Signals of recent positive selection in a worldwide sample of human populations"

ADD COMMENTlink written 8.3 years ago by tiagoantao670

Thanks for raising this concern, you are right and the issue of windows does complicate the search for validity in GO analyses. I'll go back to these papers as food for thought. The windows I have are also particularly large, some several megabases, as I looking for the longest regions of homozygosity, which means I have very large gene lists that will be diluting the true signal.

ADD REPLYlink modified 8.3 years ago • written 8.3 years ago by Rubal7770

Later, if you need I have some code (Python) to get all GO terms for a genomic region and calculate enrichment. I have not published it, but I would have no problem in giving it to you

ADD REPLYlink written 8.3 years ago by tiagoantao670

Thanks that could be very useful. I'll get in touch soon perhaps.

ADD REPLYlink written 8.3 years ago by Rubal7770
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1474 users visited in the last hour