I have a list of 1000 gene names (entrez ID) which is a subset of about 8000 genes expressed in a tumor tissue, and I want to look for GO enrichment and TFBS enrichment and so on.
I know there are a variety of online tools available. However, most of them only take these 1000 subset genes and give a p-value or FDR, which is very biased, because my whole gene set (8000 genes) is obvious enriched for different terms (for example, tumor) by itself. Could anyone suggest an online tool or R package that can take two gene lists (one for interested genes, one for all genes), and query its database and pop out p-values and FDR? Thanks.
You want to set your background to be the set of 8000 genes (so, your Fisher exact test, hypergeometric test, etc. compares the proportion of genes within the 8000 genes and not all possible genes), not perform a separate enrichment test resulting in two gene lists. This is important for the reasons that you defined above.
As one example, I believe your background is defined as the optional "gene space" parameter in FuncAssociate:
There's a bioconductor view for GO that contains a number of popular tools. BTW, perform a standard differential expression analysis and then use the DE genes as the "enriched" set and the remainder as the background.
You can use DAVID web-server (http://david.abcc.ncifcrf.gov/) for functional analysis, it allows to set a background set. But, in my experience, when doing functional analysis for lists of more than 200 genes you're always going to get some false-positive hits. Consider further filtering your list..