Question: How long it takes to run repeat masker on a full genome
0
gravatar for CAnna
2.8 years ago by
CAnna10
CAnna10 wrote:

Hi,

I am currently running a programs that requires a transposable elements annotation in GTF format. For this, the repeat masker tables from UCSC are used. I am using a new assembly that has not repeat masker table available yet, so I am running repeat masker on the entire genome (Rhesus macaque).

Does anyone that have done this before could tell me about how long it can take? I am running this with the "slow" option.

Thank you, CAnna

assembly • 1.4k views
ADD COMMENTlink modified 2.6 years ago by Biostar ♦♦ 20 • written 2.8 years ago by CAnna10

Roughly how big is the genome in Mbp? What species setting are you using, is it "all"?

ADD REPLYlink written 2.8 years ago by Philipp Bayer6.0k

Hi, thank you for your reply The genome is about 2818 Mbp long. I set the species to "macaca mulatta". Here is the command

RepeatMasker -species "macaca mulatta" -s -par 10 MacaM_Rhesus_Genome_v7.fasta

I was wondering if it not even too specific, maybe I should put "primate".

ADD REPLYlink written 2.8 years ago by CAnna10

The last time I ran that I think it took a week or two to finish...and that was after splitting it by chromosome.

ADD REPLYlink written 2.8 years ago by Devon Ryan89k

Oh, I did not expect something so long! The splitting by chromosome is a good strategy though. So you split your sequence by chromosome, run rmsk on each of them and then join the masked genome and rmsk table after right?

Sorry, I'm kind of new with this, what kind of tool can I use to split the fasta sequence by chromosome? samtools maybe?

ADD REPLYlink written 2.8 years ago by CAnna10

Yup, exactly. I think there are faster alternatives to repeatmasker these days, though I've been lucky enough that I haven't needed to do this in years (since just after mm10 came out, since it hadn't been repeatmasked at that time).

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Devon Ryan89k

Ok good, Thank you! CAnna

ADD REPLYlink written 2.8 years ago by CAnna10

@Devon Ryan which alternatives were you talking about? Would be very interested to find out, as my job has currently been running for too long! (Genome size 870Mb) :)

ADD REPLYlink written 2.7 years ago by tlorin250

Perhaps downloading the already masked version from UCSC would be a preferable option?

ADD REPLYlink written 2.8 years ago by genomax65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1060 users visited in the last hour