Why does peak analysis or motif analysis most often use a whole genome background, when they do not have any control to compare?
When I run 20k peaks for motif analysis. I picked 5000 target sequences and 40k background sequences. Why are the numbers different? Does it affect p-values (% of target sequences that have motif X versus % of background sequences that have motif X)?