How to build and use a RepeatMasker custom library inside the singularity container?
8 months ago
mthm ▴ 30

I have installed the Repeatmodeler v2 singularity container as a Dfam TE Tools which includes RepeatMasker and I have merged the RepBase library to the RMasker library using below instruction:

> # Navigate to an appropriate directory that is persistent outside the container $ cd /work
> # Make a copy of RepeatMasker's Libraries directory here $ cp -r /opt/RepeatMasker/Libraries/ ./
> # Extract RepBase (the .tar.gz file unpacks files into Libraries/) $ tar -x -f /work/path/to/RepBaseRepeatMaskerEdition-#######.tar.gz
> # Run the '' script (part of the RepeatMasker package) to merge the databases,
> # specifying the custom Libraries directory. $ -libdir Libraries/
> # Run RepeatMasker with the LIBDIR environment variable set $ export LIBDIR=/path/to/Libraries

then inside the singularity shell, I built the database and ran the repeatmodeler on a non-model fly:

BuildDatabase -name monCan3F9  monCan3F9.fa

RepeatModeler  -database monCan3F9 -pa 40 -LTRStruct

now I would like to merge the repeatmodeler output "monCan3F9-families.fa" with the Drosophila library before running the repeatmasker, how should I do that? I tried the command "queryRepeatDatabase" but it was not recognized

8 months ago
mthm ▴ 30

since this is a very new approach, I am going to explain it here, it might help others in future.

the new version of RepeatMasker-4.1.1 doesn't have the old "queryRepeatDatabase" option instead you should use ""

first you need to enter the singularity shell; navigate to the directory where your name-families.fa is, then --bind it with the same directory inside the singularity container:

singularity shell --bind $PWD:$PWD path/to/(image)name.sif

once inside the shell:


you can check the taxonomic ID of the taxon you want to extract from the library with names (this should be done if your taxon name raises name ambiguity error):

./ -i Libraries/RepeatMaskerLib.h5 names drosophila | head
Exact Matches
32281 Drosophila <flies,subgenus> (scientific name), Drosophila (Drosophila) (includes), Drosophila (Drosophila) Fallen, 1823 (authority)
7215 Drosophila <flies,genus> (scientific name), Drosophila Fallen, 1823 (authority), fruit flies <Drosophila> (genbank common name), fruit fly <Drosophila> (common name)

then extract drosophila genus data set using the ID -i RepeatMaskerLib.h5 families --format fasta_name --include-class-in-name --ancestors --descendants 7215 > drosophila-rm.fa

then merge drosophila-rm.fa with name-families.fa to create the custom library.


