Question

What Is The Best Search Engine To Use In Repeatmasker?

7

Entering edit mode

10.2 years ago

Joseph Hughes ★ 3.0k

RepeatMasker works with different search engines: abbblast, rmblast, hmmer, cross_match. Is there anywhere where these different search engines have been benchmarked in terms of repeat detection and false-positive detection, speed etc...?

I'll continue looking but I have found anything yet.

hmmer • 10k views

ADD COMMENT • link updated 10.0 years ago by timhowes ▴ 10 • written 10.2 years ago by Joseph Hughes ★ 3.0k

score 9 · Answer 1 · 2014-02-21

It depends on what species you are trying to mask and what the end goal is. RepeatMasker can run for weeks, so it's important to decide up front exactly what you need. Do you need all repeats masked as precisely as possible or do you just want a rough estimate or repetitiveness?

If you are masking a model species like human or Arabidopsis, just download the library of repeats for that species and mask with the fastest engine (probably cross_match but rmblast is probably second, if not first in terms of speed) without doing an exhaustive search. For those species, you can actually download pre-masked genomes. If you are masking a closely related species to one for which there is a library of repeats, you may want to take the same approach but with a more sensitive search.

If you are working with a non-model system, it becomes very difficult to mask using RepBase libraries because TEs evolve rapidly. For example, I have found that I can only mask 50% of the bases of sunflower TEs using RepBase due to the fact that there are no closely related species in RepBase. This highlights the importance of having species-specific repeat libraries for masking. If that is not an option, use a more sensitive, signature based method like nhmmer with models from a range of closely related species. That will give you the most sensitivity, but it will be a bit slower.

score 1 · Answer 2 · 2014-05-01

This is what it says on the RepeatMasker Web Server page:

Cross_match is slower but often more sensitive than the other engines. ABBlast ( formally known as WUBlast ) is very fast with a slight cost of sensitivity. RMBlast is a RepeatMasker compatible version of the NCBI Blast tool suite. HMMER uses the new nhmmer program to search sequences against the new Dfam database ( human only ).