1
0
Entering edit mode
16 months ago
Picasa ▴ 610

Dear all,

I have run RepeatMasker and I have this kind of result:

*out file

   SW   perc perc perc  query                 position in query    matching         repeat          position in repeat
score   div. del. ins.  sequence              begin end   (left)   repeat           class/family  begin  end    (left)  ID

428    7.3 22.8  0.0  ctg1371    230   365 (1868) C rnd-1_family-52  DNA/Maverick  (6794)    181     15   1
381   14.8 19.9  1.7  ctg1371    232   382 (1851) C rnd-1_family-50  Unknown        (938)    178      1   2 *


I don't understand why I have 2 different repeat classification and big overlap between these 2.

Is there any filtering to do ? I mean is it possible that one is more wrong than the other, and if yes based on what.

repeat overlap • 408 views
0
Entering edit mode
16 months ago

From what I can see from that output it does not seem there is a large overlap (~180 bases, no? out of 1900 ).

Also the classification of repeats by RM is not super strict, from the 1900 bases the majority can be quite different causing those two classes not to be catalogued as 1 family. On the other hand, many repeat classes share a substantial part of their content (eg. integrases/RNA polymerases/ ...) so it is not super surprising that they will share some similarity to each other.

0
Entering edit mode

However, sorry but I am not familiar with this output but how did you calculate 1900 bp ?

I have looked at the sequences rnd-1_family-52#DNA/Maverick and rnd-1_family-50#Unknown generated by RepeatModeler and their size are 6975 bp and 1116 bp respectively.

0
Entering edit mode

yeah, my bad ... was looking at the wrong column, you're indeed correct in respect to their length