Question: count the number of transcription factor binding sites
1
gravatar for Jessica
3.2 years ago by
Jessica60
Canada
Jessica60 wrote:

Hi all, 

Given ChIP-Seq data of a transcription factor, what tools are used to count the number of binding sites of the transcription factor in the whole genome?

Thanks,

Jessica

sequencing • 1.6k views
ADD COMMENTlink modified 23 months ago by Fidel1.8k • written 3.2 years ago by Jessica60

what is the form of your data? bam? bed?

ADD REPLYlink written 3.2 years ago by tangming20052.1k

It is in the bed format.

ADD REPLYlink written 3.2 years ago by Jessica60

so, it is already a peak file. then, each line is a putative binding site. I do not quite understand your question, please state more clearly.

ADD REPLYlink written 3.2 years ago by tangming20052.1k
6
gravatar for dariober
3.2 years ago by
dariober8.0k
Glasgow - UK
dariober8.0k wrote:

I don't think your question has a closed and easy answer. In a perfect world, you run your chipseq data (as aligned reads, bam or bed) through a peak caller, e.g. macs, and each region identified is a binding site, as mentioned by tangming2005.

However, the situation is typically far from perfect for a number of reasons.

1) The ChIP enrichment is often quite aspecific and noisy, depending on the quality of the antibody. Consider that it's not unusual to have >90% of the reads in the background, i.e. not in peaks.

2) Some genomic regions tend to be enriched with whatever antibody you use (an artifact that might be due to the way the reference genome is assembled, especially with respect to repetitive regions).

3) Different peak callers/algorithms might give different numbers of peaks, this difference can even be orders of magnitude. Same goes for using different parameters within the same peak caller

4) Typically, the more you sequence the more peaks you identify because small bumps that become significant.

5) Even if the ChIP works perfectly and the peak callers are ideal, there might be opportunistic sites where the transcription factor binds without having much biological relevance (as an aside, possibly related: some chipseq experiments generate many more peaks than genes in the whole genome).

In practice, you could consider as "true" binding sites the peaks which are identified in different replicates and/or which overlap a known sequence motif recognized by your transcription factor (see also the irreproducible discovery rate).

In my opinion, asking "Where are the binding sites?" is not fruitful for the problems above. Better is to ask which binding sites differ between conditions (might be treatments, stages, tissues whatever). This way the quirks associated to ChIP, peak callers etc are averaged out across replicates and conditions.

 

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by dariober8.0k
5
gravatar for Kamil
2.2 years ago by
Kamil1.6k
Boston
Kamil1.6k wrote:

You might be interested to read my tutorial on how to use CENTIPEDE to determine if a transcription factor is bound to a genomic site by making use of DNase-Seq data.

ADD COMMENTlink written 2.2 years ago by Kamil1.6k

Thanks for the tutorial !

Ming 

 

ADD REPLYlink written 2.2 years ago by tangming20052.1k
0
gravatar for Fidel
23 months ago by
Fidel1.8k
Germany
Fidel1.8k wrote:

A practical way to decide if your peak is a true peak and not an unspecific binding is to check if there is a motif associated to your transcription factor at the peak. This can be done using the meme suite. Of course, this solution assumes that your ChIP is for a protein that directly binds the DNA.

 

ADD COMMENTlink written 23 months ago by Fidel1.8k

You could use FIMO in the MEME suite to scan for motif models (JASPAR, etc.) across your genome of interest. Take the search result and convert it to a BED file. Then do set operations with BEDOPS tools (like bedmap) to find putative TF binding sites that overlap your ChIP-seq peaks.

ADD REPLYlink modified 23 months ago • written 23 months ago by Alex Reynolds20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 820 users visited in the last hour