How to build and use a RepeatMasker custom library inside the singularity container?
1
1
Entering edit mode
3.3 years ago
mthm ▴ 50

I have installed the Repeatmodeler v2 singularity container as a Dfam TE Tools which includes RepeatMasker and I have merged the RepBase library to the RMasker library using below instruction:

> # Navigate to an appropriate directory that is persistent outside the container $ cd /work
> 
> # Make a copy of RepeatMasker's Libraries directory here $ cp -r /opt/RepeatMasker/Libraries/ ./
> 
> # Extract RepBase (the .tar.gz file unpacks files into Libraries/) $ tar -x -f /work/path/to/RepBaseRepeatMaskerEdition-#######.tar.gz
> 
> # Run the 'addRepBase.pl' script (part of the RepeatMasker package) to merge the databases,
> # specifying the custom Libraries directory. $ addRepBase.pl -libdir Libraries/
> 
> # Run RepeatMasker with the LIBDIR environment variable set $ export LIBDIR=/path/to/Libraries

then inside the singularity shell, I built the database and ran the repeatmodeler on a non-model fly:

BuildDatabase -name monCan3F9  monCan3F9.fa

RepeatModeler  -database monCan3F9 -pa 40 -LTRStruct

now I would like to merge the repeatmodeler output "monCan3F9-families.fa" with the Drosophila library before running the repeatmasker, how should I do that? I tried the command "queryRepeatDatabase" but it was not recognized

te container singularity repeat • 1.6k views
ADD COMMENT
1
Entering edit mode
3.3 years ago
mthm ▴ 50

since this is a very new approach, I am going to explain it here, it might help others in future.

the new version of RepeatMasker-4.1.1 doesn't have the old "queryRepeatDatabase" option instead you should use "famdb.py"

first you need to enter the singularity shell; navigate to the directory where your name-families.fa is, then --bind it with the same directory inside the singularity container:

singularity shell --bind $PWD:$PWD path/to/(image)name.sif

once inside the shell:

singularity>

you can check the taxonomic ID of the taxon you want to extract from the library with names (this should be done if your taxon name raises name ambiguity error):

./famdb.py -i Libraries/RepeatMaskerLib.h5 names drosophila | head
Exact Matches
=============
32281 Drosophila <flies,subgenus> (scientific name), Drosophila (Drosophila) (includes), Drosophila (Drosophila) Fallen, 1823 (authority)
7215 Drosophila <flies,genus> (scientific name), Drosophila Fallen, 1823 (authority), fruit flies <Drosophila> (genbank common name), fruit fly <Drosophila> (common name)

then extract drosophila genus data set using the ID

famdb.py -i RepeatMaskerLib.h5 families --format fasta_name --include-class-in-name --ancestors --descendants 7215 > drosophila-rm.fa

then merge drosophila-rm.fa with name-families.fa to create the custom library.

ADD COMMENT

Login before adding your answer.

Traffic: 2298 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6