Question: question about TF enrichment analysis
gravatar for tujuchuanli
5 weeks ago by
tujuchuanli30 wrote:

There, I want to perform transcriptional factor binding analysis. My goal to find the over-represented and under-represented TF in one gene set over another gene set.

I have two gene sets (setA and setB). One contains 1000 genes and another contains 5000 genes.

  1. I cut the promoter region of each gene in these two gene sets.

  2. I record the number of genes whose promoter region are bind by TF A in each gene set (for example 800 for setA and 300 for setB). I also record the number of genes whose promoter region are not bind by TF A (for example 200 for setA and 4700 for setB). I downloaded the whole TF binding profile from JASPAS database. There are over 500 TF binding profile, I just take TF A as an example here.

  3. Now I have four numbers and perform chisq test using chisq.test function in R and get the P value.

The first question is whether the above is ok nor not?

For some reasons the length of promoter region for each genes in setA and setB cannot guarantee to be the same. Although the average length from these two gene sets is quite proximate. I think I should adjust it. Because longer promoter region should have higher binding. The second question is how I adjust it?

tf enrichment analysis • 115 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by tujuchuanli30

The Chi Square test seems a reasonable choice, given the data that you have accumulated. If you want to adjust it for length of promoter, why not build a regression model (somehow) and include the length as a covariate. With the model, you then extract the ANOVA Chi square p value from this:

ADD REPLYlink written 5 weeks ago by Kevin Blighe35k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 811 users visited in the last hour