Entering edit mode
3.4 years ago
Chvatil
▴
130
Hello I'm writting because I'm facing when I run RepeatModeler.
Here is the code I used :
Sp_name=Tryphoninae_B
ASSEMBLY=/beegfs/data/these/Genomes/Tryphoninae_B/Tryphoninae_B.fa
cd /beegfs/data/these/Genomes/Tryphoninae_B/run_reapeat/
/beegfs/data/TOOLS/RepeatModeler/BuildDatabase -name $Sp_name.DB -engine rmblast $ASSEMBLY
echo 'BuildDatabase done'
echo date
/beegfs/data/TOOLS/RepeatModeler/RepeatModeler -database $Sp_name.DB -pa 6 -LTRStruct
echo 'RepeatModeler done '
Here is the error message I got :
date
Building database Tryphoninae_B.DB:
Reading /beegfs/data/these/Genomes/Tryphoninae_B/Tryphoninae_B_corrected.fa...
Number of sequences (bp) added to database: 966353 ( 342816151 bp )
BuildDatabase done
date
RepeatModeler Version 2.0.1
===========================
Search Engine = rmblast 2.10.0+
Dependencies: TRF , RECON , RepeatScout , RepeatMasker
LTR Structural Analysis: Enabled ( GenomeTools , LTR_Retriever v2.9.0,
Ninja , MAFFT 7.471,
CD-HIT 4.8.1 )
Random Number Seed: 1606139499
Database = Tryphoninae_B.DB .................................................................................................
- Sequences = 966353
- Bases = 342816151
- N50 = 2702
- Contig Histogram:
Size(bp) Count
-----------------------------------------------------------------------
139193-149129 | [ ]
129258-139193 | [ ]
119323-129258 | [ 4 ]
109387-119322 | [ 2 ]
99452-109387 | [ 3 ]
89517-99452 | [ 4 ]
79582-89517 | [ 8 ]
69646-79581 | [ 14 ]
59711-69646 | [ 32 ]
49776-59711 | [ 70 ]
39841-49776 | [ 154 ]
29905-39840 | [ 347 ]
19970-29905 | [ 912 ]
10035-19970 | [ 3240 ]
100-10035 |************************************************* [ 961562 ]
WARN: The N50 for this assembly is low ( <10,000 ). The de novo methods
employed by RepeatModeler are intended for use with long contiguous
sequences and may not perform well with an over-abundance of short
contigs in the database.
Using output directory = /beegfs/data/these/Genomes/Tryphoninae_B/run_reapeat/RM_750.MonNov231452352020
Storage Throughput = fair ( 354.35 MB/s )
Ready to start the sampling process.
INFO: The runtime of RepeatModeler heavily depends on the quality of the assembly
and the repetitive content of the sequences. It is not imperative
that RepeatModeler completes all rounds in order to obtain useful
results. At the completion of each round, the files ( consensi.fa, and
families.stk ) found in:
/beegfs/data/these/Genomes/Tryphoninae_B/run_reapeat/RM_750.MonNov231452352020/
will contain all results produced thus far. These files may be
manually copied and run through RepeatClassifier should the program
be terminated early.
RepeatModeler Round # 1
========================
Searching for Repeats
-- Sampling from the database...
- Gathering up to 40000000 bp
- Final Sample Size = 40011581 bp ( 40009942 non ambiguous )
- Num Contigs Represented = 116741
- Sequence extraction : 00:00:15 (hh:mm:ss) Elapsed Time
-- Running RepeatScout on the sequences...
- RepeatScout: Running build_lmer_table ( l = 14 )..
build_lmer_table failed. Exit code 256
Do you have an idea of what is going on please ?