Question

Generating a background file to be used by Homer for motif identification

0

Entering edit mode

2.9 years ago

gkunz ▴ 30

I am looking to generate a consistent background file that can be used by homer for motif finding. Does anyone have recommendations for how to go about doing this?

Each time motif finding is performed with homer it will generate its own background file. This process results in inconsistent outputs.

There have been a number of posts regarding this or similar questions, but not much clarity has come about.

Any information on how to go about accomplishing this would be greatly appreciated.

how to set a appropriate background file when using HOMER findMotifGenome.pl

Every time the results of findMotifGenome.pl in HOMER is different

Discrepancy in HOMER output

Using Homer Programs For Denovo Motif : Different Results Among Its Own Versions

Extracting background utilized by Homer for motif finding

homer • 1.5k views

ADD COMMENT • link 2.9 years ago by gkunz ▴ 30

score 0 · Answer 1 · 2021-05-23

0

Entering edit mode

2.9 years ago

boczniak767 ▴ 850

Appropriate background file depends on your experimental design.

Take a look at the practical tips from Homer. If you have a control experiment then you can use it as background. Alternatively you can create random peaks in region defined as promoter.

I don't have much experience, but the one thing that I know, that there are no gold standard and motif analysis returns much garbage.

I plan to use few tools and compare results, you can look at this article Kulkarni et al. 2019

Generally, motifs, which are detected by different tools or in separate runs with random background can be ok.

ADD COMMENT • link 2.9 years ago by boczniak767 ▴ 850

0

Entering edit mode

Thanks for the response and the input.

I have definitely taken a look at this before (I have tried my best to digest the extensive ~300 of so pg. homer documentation) , but I will say that it still leaves me with a fair amount of uncertainty about what the best approach would be to take.

I am working with enriched genomic regions that have been identified as differential between two sets of triplicates. I am not looking for large differences between the two groups, but am really interesting in what is just a few regions depending on the comparison of interest. I samples are really similar, which based on the experimental design, they should be for the most part. This would lead me to interpret the the practical tips as - select pretty much an sequence that is shared, has representative GC content, and the appropriate quantity to be sufficient for background. Still though, going about doing that properly is a task within itself, and even if done successfully and appropriately, I would be hard pressed to think it any stronger than that generated by homer. The only difference would be at least it would be consistent and this is not even necessarily an advantage because it is clear that altering the background sequence will yield differences in the motifs identified. Additionally, I think this effect is further amplified if the number of sequences being analyzed in low.

I too am hoping to do some comparisons across motif finding programs to increase confidence in identified motifs, but anticipate running into similar issues.

Thank you for the linked paper, I will take a look!

ADD REPLY • link 2.9 years ago by gkunz ▴ 30