I installed RepeatMasker on my PC(ubuntu 10.04, 8G, cpu:i5), and this software can program very well on small sequence data(<10M), but for larger data(i.e. human chromosome 100M+) RepeatMasker always stopped before it can report a result, it that the hardware of my computer not enough for such size data? Or something else? And what is the usual request to run this software, anyone knows about these?
If RepeatMasker produced .cat files and stopped just before producing .out files, you can get those by running ProcessRepeats (program which you run similiarly as RepeatMasker and is also part of the package) on the .cat files. These two out files will in fact differ just a little, so you can actually neglect the differencies and use just newly produced file.
(I have no idea why this freezing happens - in our case restarting the server helped.)
I have tested RepeatMasker and Dust, and found both of them to be incredibly slow, in addition to being over-aggressive at masking things that should not be masked, resulting in incorrect output. I find the concept of masking to be very mysterious. I'm not sure where it started, but I can't imagine it has advanced science as often as it has caused created misinformation; it seems to me like masking is generally used to allow a bad tool to run in finite time and get published.
These days, there are good tools that don't need masking. I recommend you use them.
That said, I have written masking software, for those times when it's a good idea. It's calibrated it to ensure that unlike RepeatMasker, it only excludes the most repetitive portions of a genome (under 3% of the human reference genome at default settings). It's intended for specialized cases and is not generally relevant, and I absolutely do not recommend using it blase as people seem to use ReapeatMasker. But it's 1000x faster than RepeatMasker and Dust, so if you need to mask low-complexity portions of a genome, or the portions of a genome that are homologous to some other genome, it's available in the BBMap package as bbmask.sh. I designed it for contamination removal (to prevent false positives) and to prevent spurious results when BLASTing unknown sequence against a huge database like nt (again, to prevent false positives). I do not recommend masking to be used in other contexts.
Hi,I meet the same problem. When run the repeatmasker with human genome sequence(3 G), before the last step to get the result,the computer will restar without any information about errors and result.However, 1 G sequence can work well.Linux 128g RAM.Have you solved the problem?