Question: How To Calculate Over-Representation Of Tfbs Of Single Tf Per Gene
2
gravatar for gozuyasli
6.8 years ago by
gozuyasli20
gozuyasli20 wrote:

I am now trying to locate single specific transcription factor binding site to over 100kb sequences of ~1000 genes. But it does not matter how good the binding matrix is and how much I minimize the false positive rate, every matrix has a specific error rate. That's why binding site will be found in every gene in such long sequences. So, I want to find genes enriched in that specific binding site in their regulatory sequence.

Which test should I use and how for such enrichment analysis?

I can calculate the number of hits per gene in test genes and I approximately know the error rate of binding matrix per kb for given cut-off for similarity (given in Transfac database).

Thanks for help.

ADD COMMENTlink modified 6.8 years ago by md5sum50 • written 6.8 years ago by gozuyasli20

If i understand correctly you are looking at ~1000x 100kb sequences. If so this is probably inadvisable as sequences of this length are likely to cover other gene regulatory regions. Apologies if i misunderstood! The problem is chiefly that there will be a lot of background noise generated from non-gene-of-interest genes.

ADD REPLYlink written 6.8 years ago by Ian5.5k
0
gravatar for Anthony Mathelier
6.8 years ago by
University of Oslo, Oslo, Norway
Anthony Mathelier870 wrote:

You should have a look at the new oPOSSUM3 tool published recently (http://www.ncbi.nlm.nih.gov/pubmed/22973536).

It is a web-based system for the detection of over-represented conserved transcription factor binding sites and binding site combinations in sets of genes or sequences.

http://opossum.cisreg.ca/oPOSSUM3/

ADD COMMENTlink written 6.8 years ago by Anthony Mathelier870
0
gravatar for PoGibas
6.8 years ago by
PoGibas4.8k
Vilnius
PoGibas4.8k wrote:

GREAT-Genomic Regions Enrichment of Annotations Tool.
This might also be helpful.

You should also try these:
CisFinder - tool for finding over-representing short DNA motifs.
F-Match - tool for identifying statistically over-represented transcription factor binding sites (TFBS) in a set of sequences compared against a control set.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by PoGibas4.8k
0
gravatar for gozuyasli
6.8 years ago by
gozuyasli20
gozuyasli20 wrote:

Great is something different actually. it associates genomic regions for genes in your area of interest and then perform an enrichment analysis for these genes for GO terms, pathways and such.

I checked opossum shortly but as far as I understand it finds the enriched transcription factor binding sites for group of genes. So it will tell me whether this group of genes are regulated by this factor or not. But it wont tell me whether each of these genes has enriched binding site for this TF.

What I wanted was much simpler version of opossum. For instance, I found 16 binding sites for TF1 in regulatory sequence of geneA and I expect to find 12 binding site for this size of any sequence just by chance. So, is regulatory sequence of geneA really enriched for TF1 binding site or do I observe such number of binding sites just because of false positive error.

I decided to use hypergeometric test at the end. Counting number of hits and non-hits for test sequence and calculate the number of hits by chance using false positive error rate and than apply fisher's exact test. This would give me p value of enrichment.

Are there any better approaches for such problems around?

ADD COMMENTlink written 6.8 years ago by gozuyasli20
0
gravatar for md5sum
6.8 years ago by
md5sum50
md5sum50 wrote:

MAST, which is part of the MEME suite tests for enrichment of a single motif/pssm/tfbs in a single sequence.

ADD COMMENTlink written 6.8 years ago by md5sum50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2631 users visited in the last hour