When running RepeatMasker locally, it runs for identify different types of repetitive elements. - identifying long interspersed repeats, tough LINE1s, Simple Repeats, ALUs , ancient repeats, retrovirus-like and so on . I understand that are -alu (Only masks Alus (and 7SLRNA, SVA and LTR5)(only for primate DNA) ) option, but and the others? There are alternatives for identify only one or other type of repetitive element?
There are options to allow only masking interspersed repeats or simple repeats (listed below):
-nolow /-low Does not mask low_complexity DNA or simple repeats -noint /-int Only masks low complex/simple repeats (no interspersed repeats) -norna Does not mask small RNA (pseudo) genes
In addition, you can always create your own library of repeats and pass that to repeatmasker with the
-lib option, which may also be faster if you have specific repeats you are interested in finding.
As SES proposed, and Arian Smit suggested:
I managed this by creating my own lib. To do this I used queryRepeatDatabase.pl at util RepeatMasker directory.
>perl queryRepeatDatabase.pl -help queryRepeatDatabase.pl - 0.1 NAME queryRepeatDatabase.pl - Query the RepeatMasker repeat database. SYNOPSIS queryRepeatDatabase.pl [-version] [-species <species> | -stage <stage num> | -class <class> | -id <id>] [-stat] [-tree] [-clade] DESCRIPTION A utility script to query the RepeatMasker repeat database. The options are: -version Displays the version of the program -species "species name" The full name ( case insensitive ) of the species you would like to search for in the database. This will return all the repeats which would be used in a RepeatMasker search against this species. This includes repeats contained in the clade given by "species name" and ancestral repeats of "species name". Lastly ubiquitous sequences such as RNAs and simple repeats are also included. -clade This will modify the default behaviour of the species option and return only the repeats which are specific to your species or any of it descendents. This is useful for identifying how rich the database of repeats is for a given species/clade. -stage <stage num> The number of the RepeatMasker stage for which you would like repeats. In the past these stages were individual libraries with the following general names: Stage Library ----- ------- 0 species.lib 10 is.lib 15 rodspec.lib 20 humspec.lib 25 simple.lib 30 at.lib 35 sinecutlib 40 shortcutlib 45 cutlib 50 shortlib 55 longlib 60 mirs.lib 65 mir.lib 70 retrovirus.lib 75 l1.lib -class <class> Retrieve all elements of a particular class. For example: DNA SINE LINE LTR Other RC Satellite tRNA Simple_repeat Unknown snRNA -id <id> Retrieve only a single id from the database. -stat Returns statistics on the sequences -tree Prints the taxonomy tree for all species in the database. SEE ALSO ReapeatMasker COPYRIGHT Copyright 2005-2011 Robert Hubley, Institute for Systems Biology AUTHOR Robert Hubley <firstname.lastname@example.org>
Optimizing -q and -pa options made it even faster :)