Hi All, I performed a transcription factor binding site enrichment analysis in R. Briefly, I have a co-regulated pathway in a disease state, and I wanted to see if there were any TFs that were significantly enriched within the promoter sequences of the genes in this pathway. So I took 5000 bp upstream of the TSS of each gene, and scanned it for binding sites for the 68 human motifs found in the Jaspar database. I used hypergeometric testing against a background of 20,000 genes (5000 bp upstream also) and found 4 of the 68 TFs to be enriched. My question is, what can I do with this information next? I know that is a probably a stupidly broad question, but I'm at a bit of a loss. I thought about trying to correlate the number of TFBS to gene expression level, but the regulation architecture is probably far more complex as to make that relationship meaningless. I have access to plent of gene expression data, through various GEO datasets.
I also thought about trying to see if there were any other pathways in which these TFBSs were enriched but I'm not really sure how I would go about doing that without specifically picking out pathways in mind. I was hoping for something a little more related to discovery than to simple hyopthesis testing.
Any help is appreciated, and sorry for the probably all too vague question.