Question: repeatmasker with my library
1
gravatar for SeaStar
15 months ago by
SeaStar30
Ocean
SeaStar30 wrote:

hello! I'm analyzing the genome of a cephalopoda. I have my genome.fa and my custom library. I put this command on repeatmasker:

$:~/RepeatMasker -lib repeatlib.fa -dir output_file mygenome.fa

Is it correct? Or I have to add something like the species? Because the output generate appears to be without elements:

==================================================
file name: mygenome.fa       
sequences:          1000
total length:    1052553 bp  (1041046 bp excl N/X-runs)
GC level:         34.60 %
bases masked:     697079 bp ( 66.23 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:                0            0 bp    0.00 %
      ALUs            0            0 bp    0.00 %
      MIRs            0            0 bp    0.00 %

LINEs:                0            0 bp    0.00 %
      LINE1           0            0 bp    0.00 %
      LINE2           0            0 bp    0.00 %
      L3/CR1          0            0 bp    0.00 %

LTR elements:         0            0 bp    0.00 %
      ERVL            0            0 bp    0.00 %
      ERVL-MaLRs      0            0 bp    0.00 %
      ERV_classI      0            0 bp    0.00 %
      ERV_classII     0            0 bp    0.00 %

DNA elements:         0            0 bp    0.00 %
     hAT-Charlie      0            0 bp    0.00 %
     TcMar-Tigger     0            0 bp    0.00 %

Unclassified:      5436       722760 bp   68.67 %

Total interspersed repeats:   722760 bp   68.67 %


Small RNA:            0            0 bp    0.00 %

Satellites:           0            0 bp    0.00 %
Simple repeats:    1511        93735 bp    8.91 %
Low complexity:       0            0 bp    0.00 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element


The query species was assumed to be homo          
RepeatMasker Combined Database: Dfam_Consensus-20181026, RepBase-20181026

run with rmblastn version 2.6.0+
The query was compared to unclassified sequences in ".../repeatlib.fa"

thank you!!

genome • 634 views
ADD COMMENTlink modified 13 months ago by Biostar ♦♦ 20 • written 15 months ago by SeaStar30

I think for elements to show up the repeat library fasta headers needs to have a specific format eg.

>seq1#LTR/ERV1

ADD REPLYlink written 15 months ago by microfuge1.8k

the masking did happen, cfr this line :

bases masked:     697079 bp ( 66.23 %)

but as microfuge , pointed out the summary table might be incomplete because it's just not able to classify the found repeats correctly. In essence that's not a big issue as the most important thing is that it did mask what needed to be masked

ADD REPLYlink written 15 months ago by lieven.sterck8.5k
2

This is correct, the output summary table checks for mostly human repeats - there is a script called buildSummary.pl in the util folder of RepeatMasker which builds a better summary based on the .out files

See this for an output example RepeatMasker:understanding buildSummary.pl output

ADD REPLYlink written 15 months ago by Philipp Bayer6.7k

Ok. So, the elements are not reported in this table, but, probably I'll find them in the mygenome.out.fa, right? The file .out.tbl is not essential for me, I don't need to construct the new summary

ADD REPLYlink written 15 months ago by SeaStar30
1

don't know by heart but there is certainly an output file (might be the out.tbl ? ) that denotes which elements have been used to mask a certain region, using the fastaIDs from the library you provided

ADD REPLYlink written 15 months ago by lieven.sterck8.5k

here I report some elements as exampe of my library:

>Gypsy-5-I_BF1 RB:3e-08 89% 86
GGTCAATAGGAGGTTGGATCTTAGTTGGCAGGGTGGTTTTATATTTCCTGCCATTCAGCATTTCTGCTGGGGATTTCATGTCAGCT
>Penelope-9_HM_Penelope_Hydra1 RB:2e-08 88% 267
AAGTTTCGTAAATCGCCATACAAGAACCAACATTTGAAATATCTTAATACTGTTACCAAACAAGTGAAAAGTGATAAAGGAATTTTCGTTAAATCTGACAAGACTAGAAATATTTATAAACTGAATAAGGAGCATTACATGAATTTACTTAGGAAGGAGATTGAAAAAAATTATAAAATTACAAATGGATGGACGCTCAGAAAGACCAATTTGGATGTTAAGAAACTAATGGAGAAATATAATATTGCGGACAGAACTGAACCTATA

Is not able the program to recognize elements like these?

ADD REPLYlink written 15 months ago by SeaStar30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1588 users visited in the last hour