This question is taking bioinformatics analysis to down stream analysis.
I have identified 3 Chipseq motifs (obviously I ran chipseq called peaks and identified enrichment regions > identified motifs) which are around 7-9 letters with one mismatch. I took the coordinates of these 3 motif seq's (based on p value <0.001) and intersected them with coordinates from all the enriched regions which I have initially identified in order to find out the selected motif is enriched which of genes in my data set. When I used bed tool I used -Wo option with 50% overlap.
Now when I check the position in the sequence which correspond to motif, I don't find the exact sequence- I should see the exact seq (e.g CMGGATC) with one base mismatch. What I find is 3-4 bases randomly at the same location (e.g AGGGAA)? How can I be sure that I am not designing primers/ EMSA assay for a false motif??? Should I be not having almost all letters except one mismatch in actual motif?
As You will agree it will require lot of effort sand time. If you wonder what I used to do all this I used-MEME suit and has take advice from my previous post discussion. A: motif seq in enriched genes