Question

background for motif analysis

2

Entering edit mode

7.5 years ago

mariamari693 ▴ 20

I have some problems regarding choosing background sequences for motif analysis let say I have 3 different conditions (DNase I data), I did clustering and I want to see the differences in motif enrichment among the clusters. Each cluster has different number of intervals but for motif analysis I made them all with the same length (200 bp) (summits). my questions are: (1) shall I use the total DNase I peaks (merged from all 3 conditions) as background or genomic regions? which one is better or that makes any difference? (2) in order to do motif search for each cluster, shall I use the same set of background sequences or I should justify it based on CG% and number of regions for each cluster separately?

next-gen ChIP-Seq sequencing • 2.5k views

ADD COMMENT • link updated 7.5 years ago by ejm32 ▴ 450 • written 7.5 years ago by mariamari693 ▴ 20

score 1 · Answer 1 · 2016-10-21

The short answer is try all three!!!

My thoughts on the matter:

If you were to use the superset of DHS peaks you may not get any hits since there may not be enough enrichment from the smaller sets.
If the different sets of peaks have wildly different sequence composition then the superset will not be a good background as it will not accurately capture the sequence composition.
I would let your motif finding program choose the background sequences for you. And then I would repeat the analysis with the same set of background sequences.

Good luck!