Over representation of TFBS
2
0
Entering edit mode
9.1 years ago
bionovice • 0

Hi guys,

So I am looking to check for over representation of tfbs. I know this is done using a Fischer's test but it doesn't seem to function suitably.

I have my TF hits and their frequencies in the different kinds of treatment. I also have scrambled sequences frequency.

If a certain TFBS has a frequency then my table has a yes value and if it has no frequency(0) then it has a no value.

I just cannot seem to crack it. All help will b appreciated. Thank you.

tfbs transcription sequence • 2.2k views
ADD COMMENT
0
Entering edit mode

Just a note: There has been over-representation of this question on the forum already:

ADD REPLY
0
Entering edit mode

I know there has but the reason im posting again is cos it hasnt answered my question or helped.

ADD REPLY
0
Entering edit mode

Hmmm, I see.

ADD REPLY
0
Entering edit mode
9.0 years ago

Fisher depends on

a) The enrichment
b) Amount of TFBS in the genome!
c) Amount of candidate genes for enrichment. Amount of other genes.

If "it doesn't seem to function suitably", you might have prior knowledge about a TF involved in your context.

You thus might want to do a quick check, whether TFBS enrichment for this factor is reasonable: If you know how many possible TFBS exist for it in the genome you could simulate how strong the enrichment would have to be - given your list of candidate genes for enrichment. Could the necessary enrichment be reached within your experimental setup? Is the amount of enrichment (e.g.: fold change) reasonable compared to enrichments observed in similar biological contexts (e.g.: similar stimulation, similar/same tissue/cell type...)?

ADD COMMENT
0
Entering edit mode
9.0 years ago

I don't know exactly what you're trying to do. But here's one approach, depending on what you're trying to do.

  1. Measure counts of all TF binding sites in regions of interest (e.g., "treatment").
  2. Measure counts of specific-TF-of-interest binding sites in regions of interest.
  3. Measure counts of all TF binding sites across background (e.g., "whole-genome").
  4. Measure counts of specific-TF-of-interest sites across background.

Given the frequencies of observations across the whole genome, the probability that you observe a certain number of specific-TF-of-interest sites can be calculated from these counts using a hypergeometric distribution. You might perhaps generate these probabilities for a set of TFs and treatments of interest, measuring their relative expectations.

ADD COMMENT

Login before adding your answer.

Traffic: 2631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6