Hi Everyone,
I am trying my first hands-on with denovo motif discovery using HOMER. My sequence file has about 9K sequences that are 30bp each. I used two versions of the Homer programs for denovo motif finding findMotifsGenome.pl and findMotifs.pl
-bash-3.2$ findMotifsGenome.pl input.bed hg19 homer_output -bg background.bed -size given
-bash-3.2$ findMotifs.pl input.fasta fasta homer_output1 -fasta background.fasta
Firstly, number of results and reported match to the denovo motifs vary between both versions.!
I also used the " homer2 denovo " command and observed that the results are relatively consistent with findMotifs.pl only. I suspect that, when I give BED file, Homer fetches extra bp upstream and downstream from specified coordinate and So, I am getting a huge number of denovo motifs than the other two.
Is there any reason why the extra bp are required or significant in Homer? or Is there a way to make Homer only fetch the length I want using the BED file?
PS : I also used the - size option, but I don see any difference. From Homer manual -size option : if you wish to find motifs using your peaks using their exact sizes, use the option "-size given" However, for Transcription Factor peaks, most of the motifs are found +/- 50-75 bp from the peak center, making it better to use a fixed size rather than depend on your peak size.
Is there anything I am missing?I would appreciate any help with this.