Question: How long it takes to run repeat masker on a full genome
0
CAnna • 20 wrote:
Hi,
I am currently running a programs that requires a transposable elements annotation in GTF format. For this, the repeat masker tables from UCSC are used. I am using a new assembly that has not repeat masker table available yet, so I am running repeat masker on the entire genome (Rhesus macaque).
Does anyone that have done this before could tell me about how long it can take? I am running this with the "slow" option.
Thank you, CAnna
Roughly how big is the genome in Mbp? What species setting are you using, is it "all"?
Hi, thank you for your reply The genome is about 2818 Mbp long. I set the species to "macaca mulatta". Here is the command
RepeatMasker -species "macaca mulatta" -s -par 10 MacaM_Rhesus_Genome_v7.fasta
I was wondering if it not even too specific, maybe I should put "primate".
The last time I ran that I think it took a week or two to finish...and that was after splitting it by chromosome.
Oh, I did not expect something so long! The splitting by chromosome is a good strategy though. So you split your sequence by chromosome, run rmsk on each of them and then join the masked genome and rmsk table after right?
Sorry, I'm kind of new with this, what kind of tool can I use to split the fasta sequence by chromosome? samtools maybe?
Yup, exactly. I think there are faster alternatives to repeatmasker these days, though I've been lucky enough that I haven't needed to do this in years (since just after mm10 came out, since it hadn't been repeatmasked at that time).
Ok good, Thank you! CAnna
@Devon Ryan which alternatives were you talking about? Would be very interested to find out, as my job has currently been running for too long! (Genome size 870Mb) :)
Perhaps downloading the already masked version from UCSC would be a preferable option?