Better strategies for repeat masking?
0
1
Entering edit mode
8.2 years ago

I'm using MAKER to annotate all of my plant genomes on a single server, and the biggest bottleneck is RepeatMasker with RepBase. I'd say about 70% of the time is spent in RepeatMasker with the rest being AUGUSTUS/SNAP, which makes sense, RepeatMasker is a lot of blasting which takes very long on a single server.

I'm trying to think of ways to speed this up - so far I think I should take my reference fasta, split it up, and run RepeatMasker on a cluster with the pieces (where I can't run MAKER atm), merge the masked fasta and use that in MAKER with no RepeatMasker. Are there any alternative, faster algorithms which I can use for repeat masking, or any ideas?

Or even skip RepeatMasker in MAKER, and just filter the resulting transcripts with blastn and RepBase? Then at least the search space is much smaller (or maybe the noise by unmasked regions leads to a massive increase of resulting transcripts, I haven't tested this, has anyone?). I'm using already trained versions of AUGUSTUS/SNAP, so hopefully these shouldn't be too swayed by an increase in repeats.

repeats repeatmasker annotation • 4.5k views
ADD COMMENT
1
Entering edit mode

RepeatMasker does have options for parallelisation (multithreading, -pa?). On top of that, you can also split your genome and use different nodes + multiple cores. Also, within RepeatMasker there is a "sensitivity" option that may help you speed up the analyses. And also, there are alternative search engines within RepeatMasker. Some may be faster than others (have no idea which, you can ask RepeatMasker people). I hope this helps.

ADD REPLY
1
Entering edit mode

Thank you for that - MAKER splits the genome and runs that over MPI, so I assume that part is taken care of.

I guess I'll have to fiddle with sensitivity and come up with a ab-blast binary, that should be faster!

ADD REPLY

Login before adding your answer.

Traffic: 2643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6