What significance test should be used to analyse TFBS ?
1
0
Entering edit mode
8.8 years ago

I have TRANSFAC data of number of transcrption factor binding sites(TFBS) for each gene.

I have ~1100 Transcription factors (TFs) and 2 sets of genes: 18 genes belonging to Pigmentation AND 5 genes belonging to house keeping.

          Pigmentation genes                      House Keeping genes
        G1 G2 G3 G4 ........ G18                   G1 G2 G3 G4 G5
TF1
TF2
..
..
TF1100

SO, I have the data of number of binding sites of each TF on each gene and I want to find which of the 1100 TFs have more binding sites on pigmentation genes than HK genes?

What statistical analysis should I use for such data?

As 18 genes belong to pigmentation or 5 genes belong to HK group, they are neither replicates? So, I can not use ttest right?

Also I checked the distribution of binding sites for each TF on pigmentation and HK groups and some have normal distributions. So I think I cannot use parametric tests.

Should I use Fischer's exact test (m x 2) ? Which other test can be used for such data?

sequence R • 1.8k views
ADD COMMENT
0
Entering edit mode
8.8 years ago
UnivStudent ▴ 430

Fisher's test or the hypergeometric test can work for this type of analysis. If the data is in counts of binding sites you'll need to choose a threshold (probably >= 1), but you can also test for higher numbers of binding sites if you choose.

One other thing to consider is that the sequences of housekeeping gene promoters might be too dissimilar from your other query pigmentation gene's promoters and you'll observe false positive enrichments. Other good backgrounds could include all gene's promoters, or promoters that have a similar GC-content and/or dinucleotide frequencies to your query set.

ADD COMMENT

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6