Question

Where are the actual transcription factor binding sites located in the CHIP-seq peaks

1

Entering edit mode

8.5 years ago

Saad Khan ▴ 440

The peaks of a particular Transcription factor CHIP-seq as identified by multiple peak calling algorithms e.g. MACS2 peakzilla e.t.c identify peaks of varying sizes or regions from 50 bp up to 1000 bp or more. In that case I was wondering that if the average size of transcription factor binding site is 10 bp based on available data in OReganno e.t.c (as indicated by this post) would it be safe to assume that for a particular peak the TFBS location is in its center or not. If not then what would be a good way to go about doing it and if such a method already exist or if someone has already done it.

CHIP-seq-peaks MACS peak-calls TFBS • 3.2k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.5 years ago by Saad Khan ▴ 440

0

Entering edit mode

8.5 years ago

jotan ★ 1.3k

This is a lab-based, rather than bioinformatics based method but have you considered ChIP-exo?http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3813302/

I believe that if you have a very high-density standard ChIP-seq track, it's possible to mine this data to get the same type of information. In a deeply sequenced ChIP-seq track, it's sometimes possible to identify regions with truncated reads where the termination points mark out the TF binding region. This is also dependent on very good sonication of the samples.

ADD COMMENT • link 8.5 years ago by jotan ★ 1.3k

Ram · Accepted Answer · 2015-10-24

No, it would not be safe to assume that the peak is at the center. If you know the motif for your protein, you can simply search for that motif in your ChIP-seq peaks. If you do not know the motif, you can use any number of software packages that will look for enriched sequences in the many peaks that you have to try to define what the motif is. In both cases, after you have the motif, then you "know" the location.

Ram · Accepted Answer · 2015-10-24

2

Entering edit mode

8.5 years ago

Alex Reynolds 35k

You could use FIMO in the MEME suite to scan for motif models (JASPAR, etc.) across your genome of interest, at a level of statistical significance that you deem acceptable (e.g., 1e-4, 1e-5).

Take the search result from FIMO and convert it to a UCSC BED-formatted file. This file contains the putative binding sites of all motifs from the motif models across the whole genome.

Then do set operations with BEDOPS tools (like bedmap) to precisely locate putative TF binding sites that overlap with - are contained entirely or partially within - your ChIP-seq peaks.

$ bedmap --echo --echo-map peaks.bed wg-motifs.bed > answer.bed

If you repeat your experiments with other motifs, you can reuse your whole-genome search result to apply the same set operations.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by Alex Reynolds 35k

0

Entering edit mode

What if there are more than one enriched motifs in the peak region because of peak region being big?