Where are the actual transcription factor binding sites located in the CHIP-seq peaks
3
1
Entering edit mode
6.8 years ago

The peaks of a particular Transcription factor CHIP-seq as identified by multiple peak calling algorithms e.g. MACS2 peakzilla e.t.c identify peaks of varying sizes or regions from 50 bp to upto 1000 bp or more. In that case I was wondering that if the average size of transcription factor binding site is 10 bp based on available data in OReganno e.t.c (as indicated by this post https://www.biostars.org/p/64854/) would it be safe to assume that for a particular peak the TFBS location is in its center or not. If not then what would be a good way to go about doing it and if such a method already exist or if someone has already done it.

TFBS CHIP-seq peaks MACS peak calls • 2.8k views
2
Entering edit mode
6.8 years ago

No, it would not be safe to assume that the peak is at the center. If you know the motif for your protein, you can simply search for that motif in your ChIP-seq peaks. If you do not know the motif, you can use any number of software packages that will look for enriched sequences in the many peaks that you have to try to define what the motif is. In both cases, after you have the motif, then you "know" the location.

2
Entering edit mode
6.8 years ago

You could use FIMO in the MEME suite to scan for motif models (JASPAR, etc.) across your genome of interest, at a level of statistical significance that you deem acceptable (e.g., 1e-4, 1e-5).

Take the search result from FIMO and convert it to a UCSC BED-formatted file. This file contains the putative binding sites of all motifs from the motif models across the whole genome.

Then do set operations with BEDOPS tools (like bedmap) to precisely locate putative TF binding sites that overlap with - are contained entirely or partially within - your ChIP-seq peaks.

\$ bedmap --echo --echo-map peaks.bed wg-motifs.bed > answer.bed


If you repeat your experiments with other motifs, you can reuse your whole-genome search result to apply the same set operations.

0
Entering edit mode

What if there are more than one enriched motifs in the peak region because of peak region being big?

0
Entering edit mode

The bedmap result will show all overlapping motifs per peak.

0
Entering edit mode
6.8 years ago
jotan ★ 1.2k

This is a lab-based, rather than bioinformatics based method but have you considered ChIP-exo?http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3813302/

I believe that if you have a very high-density standard ChIP-seq track, it's possible to mine this data to get the same type of information. In a deeply sequenced ChIP-seq track, it's sometimes possible to identify regions with truncated reads where the termination points mark out the TF binding region. This is also dependent on very good sonication of the samples.