Hi all, I need to annotate a green algae genome (size 40Mbp, 53 contigs). I'm using the funannotate pipeline. For that I need to soft-mask its repetitive regions first. I've tried to use the funannotate mask command, which uses RepeatModeler (with 1 node and 24 cores) but it takes an extremely long time (>300 hours !!!). The program does seem to find repeat families and modifies the input fasta (it puts NNN(...)NN through the sequence) but the job takes too long and it gets killed before it's finished. So I have two questions:
- Is it normal for it to take so long? Is there any way to make this faster?
- If there's nothing to do, then I have to pick up from the previous killed run. So, I think I should use
RepeatModeler -recoverDir (output dir) -database (???)
... but I don't know which is my database!! which file should I specify here?. In my working directory these are the files created by the previous run:
Repeats.nhr Repeats.nnd Repeats.nog Repeats.translation unaligned.fa Repeats.nin Repeats.nni Repeats.nsq
and a directory (RM_19910.TueJun181524232019) with these two files:
consensi.fa families.stk
and subdirectories containing results of the RepeatModeler rounds.
Could anyone help me with this, please?