Question

question about TF enrichment analysis

0

Entering edit mode

5.4 years ago

tujuchuanli ▴ 100

There, I want to perform transcriptional factor binding analysis. My goal to find the over-represented and under-represented TF in one gene set over another gene set.

I have two gene sets (setA and setB). One contains 1000 genes and another contains 5000 genes.

I cut the promoter region of each gene in these two gene sets.
I record the number of genes whose promoter region are bind by TF A in each gene set (for example 800 for setA and 300 for setB). I also record the number of genes whose promoter region are not bind by TF A (for example 200 for setA and 4700 for setB). I downloaded the whole TF binding profile from JASPAS database. There are over 500 TF binding profile, I just take TF A as an example here.
Now I have four numbers and perform chisq test using chisq.test function in R and get the P value.

The first question is whether the above is ok nor not?

For some reasons the length of promoter region for each genes in setA and setB cannot guarantee to be the same. Although the average length from these two gene sets is quite proximate. I think I should adjust it. Because longer promoter region should have higher binding. The second question is how I adjust it?

TF enrichment analysis • 1.3k views

ADD COMMENT • link updated 5.2 years ago by liux.bio ▴ 360 • written 5.4 years ago by tujuchuanli ▴ 100

1

Entering edit mode

The Chi Square test seems a reasonable choice, given the data that you have accumulated. If you want to adjust it for length of promoter, why not build a regression model (somehow) and include the length as a covariate. With the model, you then extract the ANOVA Chi square p value from this:

anova(model,test="Chisq")

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k

score 0 · Answer 1 · 2019-02-23

0

Entering edit mode

5.2 years ago

liux.bio ▴ 360

For TFs enrichment analysis, you can try homer

ADD COMMENT • link 5.2 years ago by liux.bio ▴ 360