Question

Chips-Seq Replicates And Motif Discovery: What Is The Most Sound Way To Deal With The Merged Peaks (Peak Consensus) ?

0

Entering edit mode

10.4 years ago

Nick ▴ 290

I have chip-seq triplicate (3 treatment, 3 controls, each of the 6 with their input control). I have identified the peaks using macs14 for each sample and its input control. Than I performed differential binding analysis using diffBind. It produced a set of (merged) peaks (peak consensus). Now I would like to proceed with the motif discovery using meme-chip or rsat.

What is methodically most sound way to go for motif discovery from the merged peaks/peak consensus?

AFAIK meme-chip/rsat expect relatively narrow summit sequences whereas diffBind merges peaks and so produces longer peak sequences. Shall I de-merge the consensus peaks? Or merge all treatment samples into a single sample and define the peaks from it? My uncertainty stems from the fact that motif discovery tools seem to expect a single sample, rather than a set of replicates each introducing some noise and variation in the peak location (and some lacking some of the peaks altogether).

chip-seq motif replicates • 6.5k views

ADD COMMENT • link updated 7.4 years ago by Biostar 20 • written 10.4 years ago by Nick ▴ 290

2

Entering edit mode

I think you may need to consider what you know or assume about the protein first. Do you know/assume the protein you ChIP'd binds DNA directly, and if so, do you think it binds a specific motif, rather than say a more nebulus stretch of DNA enriched for some nucleotide (CpG for example) that isn't a motif per se? If the protein binds DNA directly, and binds specific motif, it will either be bound or unbound in various replicates. If it binds another factor, isn't constrained by a particular motif, or is mobile (could bind to a nucleosome, for example) you could expect differences among conditions to be manifest in both bound vs unbound as well as where the protein is bound. I think, if you assume the protein binds a specific motif, and is constrained by this motif, than it may not matter how you find the motif, your methodology may only alter how many motifs are discovered. The motif should exist in the data from the individual replicates, as well as the merged peaks, because it is the motif after all that is directing binding. However, if the protein can be expected to shift positions within a particular underlying window, than the intersection of the merged peaks may well represent the DNA between two bound locations that isn't itself actually bound, if that makes sense?

ADD REPLY • link 10.3 years ago by bede.portz ▴ 540

1

Entering edit mode

Have you checked Irreproducibility Discovery Rate (IDR)?

ADD REPLY • link 10.4 years ago by arnstrm ★ 1.8k

0

Entering edit mode

Thanks, I did - but this is not what I asked. I used DiffBind precisely in order to take care of the variation (i.e. in lieu of IDR). What I asked is: I have peaks that have been reproduced - i.e. overlap in multiple samples. Yet they are not absolutely identical - consensus peaks are built from the overlapping peaks which are, essentially, the UNION of the overlapping peaks. So the question is: Shall I search for a motif using the consensus peaks (i.e. in the UNION of the overlapping peaks) or in their INTERSECTION? Or shall I, perhaps, split the consensus peaks into its constituent peaks? Submitting the consensus or the intersection is the least hassle-free solution but I wonder what would be the most methodically sound solution.

ADD REPLY • link 10.4 years ago by Nick ▴ 290

score 5 · Answer 1 · 2014-04-15

Hi Nick, the latest version of DiffBind (1.10 in Bioc 2.14) has a new feature we're using for just this purpose (feeding meme-chip for motif identification). You can now center peaks on the summit (point of highest pileup across the samples). If you set the "summits" parameter in dba.count, it will first compute a summit for each peak for each sample, then derive a consensus summit for each peak over all the samples. If the values of "summits" is a number, it will then re-center the peaks, including "summits" base-pairs up- and down-stream of the summit. So saying dba.count(DBA,summits=250) will give you 500bp peaks (250 upstream/downstream) centered on the point of highest pileup. We're finding this gives much better results when doing motif analysis!