Question: Repetitive elements on my genome
gravatar for Chironex
20 months ago by
Chironex30 wrote:

I'm analyzing mygenome.fa with RepeatScout and RepeatMasker to find transposable elements. I produced a library of Repetitive elements with repeatscout and the I masked TE using Repeatmasker.

./build_lmer_table -l 14 -sequence mygenome.fa -freq mygenome.freq
./RepeatScout -sequence mygenome.fa -output mygenome_repeats.fa -freq mygenome.freq -l 14
cat mygenome_repeats.fa| ./filter-stage-1.prl >mygenome_repeats_filtered1.fasta
./RepeatMasker -s -lib mygenome_repeats_filtered1.fa mygenome.fa

Generate a masked genome using (non-low-complexity, non-tandem) repeats

cat mygenome_repeats_filtered1.fa | ./filter-stage-2.prl --cat mygenome.fa.out >mygenome_repeats_filtered2.fa

Filter out all (non-low-complexity, non-tandem) repeats that have less than 10 repeats

./RepeatMasker -pa 4 -s -lib mygenome_repeats_filtered2.fa -nolow -norna -no_is -gff mygenome.fa

After this I ran Repeatmasker:

./RepeatMasker no_is mygenome_repeats_filtered2.fa

and I produced: mygenome_repeats_filt2.fa.masked mygenome_repeats_filt2.fa.tbl mygenome_repeats_filt2.fa.out

I would to know a way to find the location of transposable elements masked by RepeatMasker on my genome.

Supposing that this is a part of my Repetitive elements library produced with RepeatScout as I showed above:

less mygenome_repeats_filt2.fa

>R=3 (RR=4.  TRF=0.000 NSEG=0.000)
>R=4 (RR=5.  TRF=0.122 NSEG=0.226)
>R=6 (RR=7.  TRF=0.134 NSEG=0.247)

from the file mygenome_repeats_filt2.fa.out I can see that in R=3 there is a trasposable element:

   SW   perc perc perc  query     position in query    matching        repeat              position in repeat
score   div. del. ins.  sequence  begin end   (left)   repeat          class/family      begin  end    (left)  ID

348   20.6  0.0  0.0  R=3          78   140   (81) + AmnSINE2        SINE/tRNA-Deu         67    129  (229) 234  

As you can see I have the coordinates to find this element in the library, but I would to find its exactly in the file that contain my assembled genome.

transposon • 461 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by Chironex30

frida : Have you checked repeatmasker help to see if there is a way to change the repeat nucleotides to lower case (or N's in case of hard masking) when they are found?

ADD REPLYlink modified 20 months ago • written 20 months ago by GenoMax94k

No i didn't. I can check it

ADD REPLYlink written 20 months ago by Chironex30

Is the "position in query begin/end" not telling what you are looking for or am I missing something?

as genomax mentioned , using the hard/soft masking approach (and then analysing the masked genome) will anyway give the exact locations of repeats in the genome. You will have to specifically activate it, since RepeatMasker will not do it by default

ADD REPLYlink modified 20 months ago • written 20 months ago by lieven.sterck9.4k

I have to find this option in help. Anyway, I'm interested in find the position in mygenome.fa that is divided in scaffold, not in query that is R=...

ADD REPLYlink written 20 months ago by Chironex30

we are taking about this RepeatMasker, correct?

ADD REPLYlink written 20 months ago by lieven.sterck9.4k

Hi , i think you have the sequences repeat in output. So you just have to align yours repeat sequences against your genome with Blast for exemple.


ADD REPLYlink written 20 months ago by Titus910

I modified my post with some other information for better explain the problem

ADD REPLYlink written 20 months ago by Chironex30

So you can BLAST mygenome_repeats_filt2.fa against your_genome.fa.

ADD REPLYlink written 20 months ago by Titus910
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2216 users visited in the last hour