Question: Error in repeatmodeler consensi file output
0
gravatar for ngs_new_user
14 months ago by
ngs_new_user0 wrote:

Hello everyone, I am trying to analyze repeat sequences in a non model organism (insect) by using repeatmodeler and repeat masker. I first began by building repeat models using repeatmodeler by first creating a database of the species (species.DB)

BuildDatabase -name species.DB -engine NCBI species.fa

Then used the following line of code:

RepeatModeler -database species.DB -engine ncbi -pa 16

I believe that the next step is using one of the output files (consensi.fa.classified) as the repeat library in repeatmasker. However, none of the repeat library file outputs from repeat modeler have the word 'classified' in it. I only have consensi.fa file. I therefore used it in repeatmasker and all the repeat sequences identified are listed under "unclassified" as shown below. What am I missing? Is there a step that I should do to get a classified list of the repeat sequences? Also, any reason why it assumes the query sequence is "homo" I'm assuiming homo sapiens? Any suggestions will be highly appreciated. Thank you.

==================================================
file name: species.fa         
sequences:        107111
total length:  279238173 bp  (275719484 bp excl N/X-runs)
GC level:         27.25 %
bases masked:   78241030 bp ( 28.02 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:                0            0 bp    0.00 %
      ALUs            0            0 bp    0.00 %
      MIRs            0            0 bp    0.00 %

LINEs:                0            0 bp    0.00 %
      LINE1           0            0 bp    0.00 %
      LINE2           0            0 bp    0.00 %
      L3/CR1          0            0 bp    0.00 %

LTR elements:         0            0 bp    0.00 %
      ERVL            0            0 bp    0.00 %
      ERVL-MaLRs      0            0 bp    0.00 %
      ERV_classI      0            0 bp    0.00 %
      ERV_classII     0            0 bp    0.00 %

DNA elements:         0            0 bp    0.00 %
     hAT-Charlie      0            0 bp    0.00 %
     TcMar-Tigger     0            0 bp    0.00 %

Unclassified:    656772     85428964 bp   30.59 %

Total interspersed repeats: 85428964 bp   30.59 %


Small RNA:            0            0 bp    0.00 %

Satellites:           0            0 bp    0.00 %
Simple repeats:  245893     10827053 bp    3.88 %
Low complexity:       0            0 bp    0.00 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element


The query species was assumed to be homo          
RepeatMasker Combined Database: Dfam_Consensus-20171107, RepBase-20170127
ADD COMMENTlink modified 6 months ago by svitlana.lukicheva10 • written 14 months ago by ngs_new_user0
0
gravatar for svitlana.lukicheva
6 months ago by
Brussels, Belgium
svitlana.lukicheva10 wrote:

Hello, this question was answered here: RepeatModeler GitHub

ADD COMMENTlink written 6 months ago by svitlana.lukicheva10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1692 users visited in the last hour