Question: count the number of transcription factor binding sites
gravatar for Jessica
6.4 years ago by
Jessica60 wrote:

Hi all, 

Given ChIP-Seq data of a transcription factor, what tools are used to count the number of binding sites of the transcription factor in the whole genome?



sequencing • 3.0k views
ADD COMMENTlink modified 5.1 years ago by Fidel1.9k • written 6.4 years ago by Jessica60

what is the form of your data? bam? bed?

ADD REPLYlink written 6.4 years ago by Ming Tang2.6k

It is in the bed format.

ADD REPLYlink written 6.4 years ago by Jessica60

so, it is already a peak file. then, each line is a putative binding site. I do not quite understand your question, please state more clearly.

ADD REPLYlink written 6.4 years ago by Ming Tang2.6k
gravatar for dariober
6.4 years ago by
WCIP | Glasgow | UK
dariober11k wrote:

I don't think your question has a closed and easy answer. In a perfect world, you run your chipseq data (as aligned reads, bam or bed) through a peak caller, e.g. macs, and each region identified is a binding site, as mentioned by tangming2005.

However, the situation is typically far from perfect for a number of reasons.

1) The ChIP enrichment is often quite aspecific and noisy, depending on the quality of the antibody. Consider that it's not unusual to have >90% of the reads in the background, i.e. not in peaks.

2) Some genomic regions tend to be enriched with whatever antibody you use (an artifact that might be due to the way the reference genome is assembled, especially with respect to repetitive regions).

3) Different peak callers/algorithms might give different numbers of peaks, this difference can even be orders of magnitude. Same goes for using different parameters within the same peak caller

4) Typically, the more you sequence the more peaks you identify because small bumps that become significant.

5) Even if the ChIP works perfectly and the peak callers are ideal, there might be opportunistic sites where the transcription factor binds without having much biological relevance (as an aside, possibly related: some chipseq experiments generate many more peaks than genes in the whole genome).

In practice, you could consider as "true" binding sites the peaks which are identified in different replicates and/or which overlap a known sequence motif recognized by your transcription factor (see also the irreproducible discovery rate).

In my opinion, asking "Where are the binding sites?" is not fruitful for the problems above. Better is to ask which binding sites differ between conditions (might be treatments, stages, tissues whatever). This way the quirks associated to ChIP, peak callers etc are averaged out across replicates and conditions.


ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by dariober11k
gravatar for Kamil
5.4 years ago by
Kamil2.0k wrote:

You might be interested to read my tutorial on how to use CENTIPEDE to determine if a transcription factor is bound to a genomic site by making use of DNase-Seq data.

ADD COMMENTlink written 5.4 years ago by Kamil2.0k

Thanks for the tutorial!


ADD REPLYlink modified 12 months ago by RamRS30k • written 5.4 years ago by Ming Tang2.6k
gravatar for Fidel
5.1 years ago by
Fidel1.9k wrote:

A practical way to decide if your peak is a true peak and not an unspecific binding is to check if there is a motif associated to your transcription factor at the peak. This can be done using the meme suite. Of course, this solution assumes that your ChIP is for a protein that directly binds the DNA.

ADD COMMENTlink modified 12 months ago by RamRS30k • written 5.1 years ago by Fidel1.9k

You could use FIMO in the MEME suite to scan for motif models (JASPAR, etc.) across your genome of interest. Take the search result and convert it to a BED file. Then do set operations with BEDOPS tools (like bedmap) to find putative TF binding sites that overlap your ChIP-seq peaks.

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Alex Reynolds31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1274 users visited in the last hour