Question: repeatmasker empty output
0
gravatar for frida
4 weeks ago by
frida10
rome
frida10 wrote:

hi everybody, finally I run Repeatmasker but after all, it generates a table like this:

file name: OB_100DEC.fa
sequences: 100 total length: 209466439 bp (184235452 bp excl N/X-runs) GC level: 35.24 %

bases masked: 73100024 bp ( 34.90 %)

           number of      length   percentage

elements* occupied of sequence

SINEs: 0 0 bp 0.00 % ALUs 0 0 bp 0.00 % MIRs 0 0 bp 0.00 %

LINEs: 0 0 bp 0.00 % LINE1 0 0 bp 0.00 % LINE2 0 0 bp 0.00 % L3/CR1 0 0 bp 0.00 %

LTR elements: 0 0 bp 0.00 % ERVL 0 0 bp 0.00 % ERVL-MaLRs 0 0 bp 0.00 % ERV_classI 0 0 bp 0.00 % ERV_classII 0 0 bp 0.00 %

DNA elements: 0 0 bp 0.00 % hAT-Charlie 0 0 bp 0.00 % TcMar-Tigger 0 0 bp 0.00 %

Unclassified: 560941 66374119 bp 31.69 %

Total interspersed repeats: 66374119 bp 31.69 %

Small RNA: 0 0 bp 0.00 %

Satellites: 0 0 bp 0.00 % Simple repeats: 290711 15358198 bp 7.33 %

Low complexity: 0 0 bp 0.00 %

  • most repeats fragmented by insertions or deletions have been counted as one element

The query species was assumed to be homo
RepeatMasker Combined Database: Dfam_Consensus-20181026

run with rmblastn version 2.6.0+ The query was compared to unclassified sequences in ".../OB_100DEC_repeats_filtered1.fa"

<h6>#</h6>

I used repeatsout to generate the library, and this was my command:

./RepeatMasker -s -lib /home/RepeatScout-1.0.5/OB_100DEC_repeats_filtered1.fa /home/Workdirectory/OB_100DEC.fa

can anyone explain why there are not TE? The fasta genome file is about 200 mb and it is composed by the 100 greatest contigs of my genome. thank you

genome • 140 views
ADD COMMENTlink written 4 weeks ago by frida10

I think the TE type info is taken from fasta header of the repeat library

>SINEC2A2_CF#SINE/tRNA RepbaseID: SINEC2A2_CFXX

Can you check if fasta header of your repeat library look like this ?

ADD REPLYlink written 4 weeks ago by microfuge1.1k

nope. I generated the library by myself using repeatscout and the header of fasta file is:

R=3 (RR=4. TRF=0.000 NSEG=0.000) TAAGGCGGCGAGCTGGCAGAATCGTTAGCACGCCGGGCGAAATGCTTAGCGGTATTTCGTCTGTCTTTACGTTCTGAGTT CAAATTCCGCCGAGGTCGACTTTGCCTTTCATCCTTTCGGGGTCGATAAAATAAGTACCAGTTGAGCACTGGGGTCGATG TAATCGACTTACCCCCTCCCCCAAAATTTCTGGCCTTGTGCCTATATTAGAAACGATTATT R=4 (RR=5. TRF=0.122 NSEG=0.226) ACACACACACACACACACACACACACATATATATATATATACATATATACGACGGGCTTCTTTCAGTTTCCGTCTACCAA ATCCACTCACAAGGCTTTGGTCGGCCCGAGGCTATAGTAGAAGACACTTGCCCAAGGTGCCACGCAGTGGGACTGAACCC GGAACCATGTGGTTGGTAAGCAAGCTACTTACCACACAGCCACTCCTGCGCCTATATATAT R=6 (RR=7. TRF=0.134 NSEG=0.247) TTGTTTCAGTCATTTGACTGCGGCCATGCTGGAGCACCGCCTTTAGTCGAGCAAATCGACCCCAGGACTTATTCTTTGTA AGCCTAGTACTTATTCTATCGGTCTCTTTTGCCGAACCGCTAAGTTACGGGGACGTAAACACACCAGCATCGGTTGTCAA GCGATGTTGGGGGGACAAACACAGACACACAAACACACACACACACATACATATATATATATATATATATA and so on..

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by frida10

Thanks and sorry that I don't have a direct answer.

What we do is use RepeatModeler http://www.repeatmasker.org/RepeatModeler/ . Then use the queryRepeatDatabase.pl which comes with RepeatMasker to dump the repeats in the species we use (eg -species Carnivora . The combine the repeatlibs generated from both (RepeatModeler and queryRepeatDatabase.pl) to generate a repeat lib file for masking.

Also just in case if you installed repeatmasker yourself, the repeat libs need to be downloaded from repbase (https://www.girinst.org/repbase/) which needs a registration and repeatmasker configured to use them.

I forgot this step and it resulted in very low masking which resolved after downloading the repeatlibs from repbase.

ADD REPLYlink written 4 weeks ago by microfuge1.1k

hi! I downloaded RebBase from girinst months ago. Now I can't redownload it even if I'm registered because it requires a submission of my institute and I don't know how can obtain it. Anyway, as I said, I have this RepBase library installed with the program. If you read my message above, appears this message "The query species was assumed to be homo RepeatMasker Combined Database: Dfam_Consensus-20181026". I don't undestand what should be the command line to run the program. Could you help me showing an example?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by frida10

ok. You can check if your installed repeat libraries contain different types of TEs with the script that comes with repeatmasker.

queryRepeatDatabase.pl -species human -stat

This should dump different TE types (LINE, SINE ... ) . If the output does not contain these TE types then the output you are getting would make sense.

ADD REPLYlink written 4 weeks ago by microfuge1.1k

Ok. In this repbase library I've found two .embl files with different types of TE. There are not fasta formats. Are they usable? The species in my analysis is an invertebrate (cephalopod) so what species shohld I use? Thank you again

ADD REPLYlink written 4 weeks ago by frida10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1077 users visited in the last hour