Question: The Hardware Request For Repeatmasker
1
gravatar for chjiao3456
8.3 years ago by
chjiao345640
Michigan State University, USA
chjiao345640 wrote:

I installed RepeatMasker on my PC(ubuntu 10.04, 8G, cpu:i5), and this software can program very well on small sequence data(<10M), but for larger data(i.e. human chromosome 100M+) RepeatMasker always stopped before it can report a result, it that the hardware of my computer not enough for such size data? Or something else? And what is the usual request to run this software, anyone knows about these?

repeatmasker • 2.8k views
ADD COMMENTlink modified 3.2 years ago by miya80 • written 8.3 years ago by chjiao345640

I think is something else, I already ran RM on linux boxes with 2 cores and 4 GB, masking human genome sizes.

ADD REPLYlink written 8.3 years ago by JC12k

And how long did the process cost?

ADD REPLYlink written 8.3 years ago by chjiao345640

it take a few hours, ~10 hrs

ADD REPLYlink written 8.3 years ago by JC12k

Thanks very much~ There are three kinds of Sequence Search Engine (Cross_Match, RMBlast and ABBlast) available and I am using RMBlast, which one is on your machine?

ADD REPLYlink written 8.3 years ago by chjiao345640

I have the crossmatch and rmblast installed, did you find what is the problem?

ADD REPLYlink written 8.3 years ago by JC12k

I think it's the limitation of RAM size, RepeatMasker occupy nearly all my RAM(8G) while working, and the request grows as time lasts I think that's why the program always stopped before results occur. The option -maxsize I used the default number and it is said in the help document that "Memory requirements go up with higher maxsize." Did you change this or other options when running RepeatMasker?

ADD REPLYlink modified 8.3 years ago • written 8.3 years ago by chjiao345640

no, I use defaults

ADD REPLYlink written 8.3 years ago by JC12k

I used this settings: RepeatMasker -nolow -norna -div 18 -gccalc -maxsize 999000000 -species human

ADD REPLYlink written 8.3 years ago by Biomonika (Noolean)3.1k

you're using a big MAXSIZE, the default is 4000000 (4Mb), whe you increase that value your RAM will blow. Try to reduce it.

ADD REPLYlink written 8.3 years ago by JC12k
2
gravatar for Biomonika (Noolean)
8.3 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

If RepeatMasker produced .cat files and stopped just before producing .out files, you can get those by running ProcessRepeats (program which you run similiarly as RepeatMasker and is also part of the package) on the .cat files. These two out files will in fact differ just a little, so you can actually neglect the differencies and use just newly produced file.

(I have no idea why this freezing happens - in our case restarting the server helped.)

ADD COMMENTlink modified 8.3 years ago • written 8.3 years ago by Biomonika (Noolean)3.1k

Thanks for your answer! RepeatMasker is stopped during the producing .out files process, and I am using ProcessRepeats on .cat file now, however, ProcessRepeats occupy nearly all my RAM(8G) while working, I think that's why RepeatMasker always stopped during the process, maybe I should try a server with larger RAM.

ADD REPLYlink written 8.3 years ago by chjiao345640

I can get results now! Thanks, so can I use RepeatMasker to produce only .cat files before I use ProcessRepeats?

ADD REPLYlink written 8.3 years ago by chjiao345640

Hi, unfortunately i dont know how to produce just .cat files with RepeatMasker:/ However, I am glad that I could help you (you can upvote my answer in return ;-)

ADD REPLYlink written 8.3 years ago by Biomonika (Noolean)3.1k
0
gravatar for Brian Bushnell
3.8 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

I have tested RepeatMasker and Dust, and found both of them to be incredibly slow, in addition to being over-aggressive at masking things that should not be masked, resulting in incorrect output. I find the concept of masking to be very mysterious. I'm not sure where it started, but I can't imagine it has advanced science as often as it has caused created misinformation; it seems to me like masking is generally used to allow a bad tool to run in finite time and get published.

These days, there are good tools that don't need masking. I recommend you use them.

That said, I have written masking software, for those times when it's a good idea. It's calibrated it to ensure that unlike RepeatMasker, it only excludes the most repetitive portions of a genome (under 3% of the human reference genome at default settings). It's intended for specialized cases and is not generally relevant, and I absolutely do not recommend using it blase as people seem to use ReapeatMasker. But it's 1000x faster than RepeatMasker and Dust, so if you need to mask low-complexity portions of a genome, or the portions of a genome that are homologous to some other genome, it's available in the BBMap package as bbmask.sh. I designed it for contamination removal (to prevent false positives) and to prevent spurious results when BLASTing unknown sequence against a huge database like nt (again, to prevent false positives). I do not recommend masking to be used in other contexts.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Brian Bushnell17k
0
gravatar for miya
3.2 years ago by
miya80
miya80 wrote:

Hi,I meet the same problem. When run the repeatmasker with human genome sequence(3 G), before the last step to get the result,the computer will restar without any information about errors and result.However, 1 G sequence can work well.Linux 128g RAM.Have you solved the problem?

ADD COMMENTlink written 3.2 years ago by miya80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1806 users visited in the last hour