Question

The Hardware Request For Repeatmasker

1

Entering edit mode

11.7 years ago

chjiao3456 ▴ 40

I installed RepeatMasker on my PC(ubuntu 10.04, 8G, cpu:i5), and this software can program very well on small sequence data(<10M), but for larger data(i.e. human chromosome 100M+) RepeatMasker always stopped before it can report a result, it that the hardware of my computer not enough for such size data? Or something else? And what is the usual request to run this software, anyone knows about these?

repeatmasker • 4.5k views

ADD COMMENT • link updated 6.6 years ago by miya ▴ 80 • written 11.7 years ago by chjiao3456 ▴ 40

0

Entering edit mode

I think is something else, I already ran RM on linux boxes with 2 cores and 4 GB, masking human genome sizes.

ADD REPLY • link 11.7 years ago by JC 13k

0

Entering edit mode

And how long did the process cost?

ADD REPLY • link 11.7 years ago by chjiao3456 ▴ 40

0

Entering edit mode

it take a few hours, ~10 hrs

ADD REPLY • link 11.7 years ago by JC 13k

0

Entering edit mode

Thanks very much~ There are three kinds of Sequence Search Engine (Cross_Match, RMBlast and ABBlast) available and I am using RMBlast, which one is on your machine?

ADD REPLY • link 11.7 years ago by chjiao3456 ▴ 40

0

Entering edit mode

I have the crossmatch and rmblast installed, did you find what is the problem?

ADD REPLY • link 11.7 years ago by JC 13k

0

Entering edit mode

I think it's the limitation of RAM size, RepeatMasker occupy nearly all my RAM(8G) while working, and the request grows as time lasts I think that's why the program always stopped before results occur. The option -maxsize I used the default number and it is said in the help document that "Memory requirements go up with higher maxsize." Did you change this or other options when running RepeatMasker?

ADD REPLY • link 11.7 years ago by chjiao3456 ▴ 40

0

Entering edit mode

no, I use defaults

ADD REPLY • link 11.7 years ago by JC 13k

0

Entering edit mode

I used this settings: RepeatMasker -nolow -norna -div 18 -gccalc -maxsize 999000000 -species human

ADD REPLY • link 11.7 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

you're using a big MAXSIZE, the default is 4000000 (4Mb), whe you increase that value your RAM will blow. Try to reduce it.

ADD REPLY • link 11.7 years ago by JC 13k

score 2 · Answer 1 · 2012-08-21

2

Entering edit mode

11.7 years ago

Biomonika (Noolean) 3.2k

If RepeatMasker produced .cat files and stopped just before producing .out files, you can get those by running ProcessRepeats (program which you run similiarly as RepeatMasker and is also part of the package) on the .cat files. These two out files will in fact differ just a little, so you can actually neglect the differencies and use just newly produced file.

(I have no idea why this freezing happens - in our case restarting the server helped.)

ADD COMMENT • link 11.7 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

Thanks for your answer! RepeatMasker is stopped during the producing .out files process, and I am using ProcessRepeats on .cat file now, however, ProcessRepeats occupy nearly all my RAM(8G) while working, I think that's why RepeatMasker always stopped during the process, maybe I should try a server with larger RAM.

ADD REPLY • link 11.7 years ago by chjiao3456 ▴ 40

0

Entering edit mode

I can get results now! Thanks, so can I use RepeatMasker to produce only .cat files before I use ProcessRepeats?

ADD REPLY • link 11.7 years ago by chjiao3456 ▴ 40

0

Entering edit mode

Hi, unfortunately i dont know how to produce just .cat files with RepeatMasker:/ However, I am glad that I could help you (you can upvote my answer in return ;-)

ADD REPLY • link 11.7 years ago by Biomonika (Noolean) 3.2k

score 0 · Answer 2 · 2017-03-02

I have tested RepeatMasker and Dust, and found both of them to be incredibly slow, in addition to being over-aggressive at masking things that should not be masked, resulting in incorrect output. I find the concept of masking to be very mysterious. I'm not sure where it started, but I can't imagine it has advanced science as often as it has caused created misinformation; it seems to me like masking is generally used to allow a bad tool to run in finite time and get published.

These days, there are good tools that don't need masking. I recommend you use them.

That said, I have written masking software, for those times when it's a good idea. It's calibrated it to ensure that unlike RepeatMasker, it only excludes the most repetitive portions of a genome (under 3% of the human reference genome at default settings). It's intended for specialized cases and is not generally relevant, and I absolutely do not recommend using it blase as people seem to use ReapeatMasker. But it's 1000x faster than RepeatMasker and Dust, so if you need to mask low-complexity portions of a genome, or the portions of a genome that are homologous to some other genome, it's available in the BBMap package as bbmask.sh. I designed it for contamination removal (to prevent false positives) and to prevent spurious results when BLASTing unknown sequence against a huge database like nt (again, to prevent false positives). I do not recommend masking to be used in other contexts.

score 0 · Answer 3 · 2017-09-04

0

Entering edit mode

6.6 years ago

miya ▴ 80

Hi,I meet the same problem. When run the repeatmasker with human genome sequence(3 G), before the last step to get the result,the computer will restar without any information about errors and result.However, 1 G sequence can work well.Linux 128g RAM.Have you solved the problem?

ADD COMMENT • link 6.6 years ago by miya ▴ 80