Question

Homer motif analysis: "Very close % of target vs % of Background sequences with Motif" & "SeqBias at 1st rank"

0

Entering edit mode

6.6 years ago

chiefcat ▴ 180

Hi everyone. I'm trying to find the motifs bound by my ChIP-ed protein using Homer. I don't have knowledge regarding NGS analysis. The analysis was done by my labmate who understand programming languages but lack of experience in analysing ChIP-seq.

When going through the result list, I found that there are 2 obvious issues happened.

1) The % of target vs % of Background sequences with Motif are very close throughout the list. When this happened, does it mean that the motifs found are actually not/ very lowly enriched in my ChIP-ed sample?

2) The "SeqBias: CG bias" appears as the 1st ranked motif in my list. I know this indicates that the analysis parameter is not appropriate as stated in http://homer.ucsd.edu/homer/motif/practicalTips.html. This is probably related to the normalization to either by total GC-content or CpG-content if I understand the tutorial correctly. Should I just adding "-gc" to the motif finding command to solve this problem? If having this problem on my list, do other results on the list still valid?

Thanks very much!

enter image description here

ChIP-Seq Homer Motif • 5.4k views

ADD COMMENT • link updated 8 months ago by os306 ▴ 10 • written 6.6 years ago by chiefcat ▴ 180

0

Entering edit mode

Almost six years later, I am facing a similar situation when interpreting my ChIP-seq data. I was wondering if you were able to figure out/troubleshoot what the issue was?

Thank you!

ADD REPLY • link 8 months ago by os306 ▴ 10

score 1 · Answer 1 · 2017-09-05

Hi, I usually get a decent enrichment over background, sometimes upto 3 fold, but at least a 1.5 fold would be expected. Can you give us your command line? What does your bedfile (input) look line? Could it be that you are searching for a motif in too large of an area? Usually you search for the motif within +/-100 bp of the peak summit, and +/-250bp for the "co-binding" Tfs. If you search for a motif in large areas (i.e. entire peak) you might end up with high background.

Also, Have you removed the "blacklisted" regions from your Chip-seq peaks to get rid of the background regions that are called peaks by every single ChIP experiment (type blacklisted regions and your genome in google).

As Homer says, I would try to match the background to your Chipseq peaks. Are they mostly promoters? then generate a background file for promoters etc.. by default homer is using randomly picked regions as background.