Question: Repetitive elements on my genome
0
gravatar for Chironex
17 months ago by
Chironex20
rome
Chironex20 wrote:

I'm analyzing mygenome.fa with RepeatScout and RepeatMasker to find transposable elements. I produced a library of Repetitive elements with repeatscout and the I masked TE using Repeatmasker.

./build_lmer_table -l 14 -sequence mygenome.fa -freq mygenome.freq
./RepeatScout -sequence mygenome.fa -output mygenome_repeats.fa -freq mygenome.freq -l 14
cat mygenome_repeats.fa| ./filter-stage-1.prl >mygenome_repeats_filtered1.fasta
./RepeatMasker -s -lib mygenome_repeats_filtered1.fa mygenome.fa

Generate a masked genome using (non-low-complexity, non-tandem) repeats

cat mygenome_repeats_filtered1.fa | ./filter-stage-2.prl --cat mygenome.fa.out >mygenome_repeats_filtered2.fa

Filter out all (non-low-complexity, non-tandem) repeats that have less than 10 repeats

./RepeatMasker -pa 4 -s -lib mygenome_repeats_filtered2.fa -nolow -norna -no_is -gff mygenome.fa

After this I ran Repeatmasker:

./RepeatMasker no_is mygenome_repeats_filtered2.fa

and I produced: mygenome_repeats_filt2.fa.masked mygenome_repeats_filt2.fa.tbl mygenome_repeats_filt2.fa.cat mygenome_repeats_filt2.fa.out

I would to know a way to find the location of transposable elements masked by RepeatMasker on my genome.

Supposing that this is a part of my Repetitive elements library produced with RepeatScout as I showed above:

less mygenome_repeats_filt2.fa

>R=3 (RR=4.  TRF=0.000 NSEG=0.000)
TAAGGCGGCGAGCTGGCAGAATCGTTAGCACGCCGGGCGAAATGCTTAGCGGTATTTCGTCTGTCTTTACGTTCTGAGTT
CAAATTCCGCCGAGGTCGACTTTGCCTTTCATCCTTTCGGGGTCGATAAAATAAGTACCAGTTGAGCACTGGGGTCGATG
TAATCGACTTACCCCCTCCCCCAAAATTTCTGGCCTTGTGCCTATATTAGAAACGATTATT
>R=4 (RR=5.  TRF=0.122 NSEG=0.226)
ACACACACACACACACACACACACACATATATATATATATACATATATACGACGGGCTTCTTTCAGTTTCCGTCTACCAA
ATCCACTCACAAGGCTTTGGTCGGCCCGAGGCTATAGTAGAAGACACTTGCCCAAGGTGCCACGCAGTGGGACTGAACCC
GGAACCATGTGGTTGGTAAGCAAGCTACTTACCACACAGCCACTCCTGCGCCTATATATAT
>R=6 (RR=7.  TRF=0.134 NSEG=0.247)
TTGTTTCAGTCATTTGACTGCGGCCATGCTGGAGCACCGCCTTTAGTCGAGCAAATCGACCCCAGGACTTATTCTTTGTA
AGCCTAGTACTTATTCTATCGGTCTCTTTTGCCGAACCGCTAAGTTACGGGGACGTAAACACACCAGCATCGGTTGTCAA
GCGATGTTGGGGGGACAAACACAGACACACAAACACACACACACACATACATATATATATATATATATATA

from the file mygenome_repeats_filt2.fa.out I can see that in R=3 there is a trasposable element:

   SW   perc perc perc  query     position in query    matching        repeat              position in repeat
score   div. del. ins.  sequence  begin end   (left)   repeat          class/family      begin  end    (left)  ID

348   20.6  0.0  0.0  R=3          78   140   (81) + AmnSINE2        SINE/tRNA-Deu         67    129  (229) 234  

As you can see I have the coordinates to find this element in the library, but I would to find its exactly in the file that contain my assembled genome.

transposon • 409 views
ADD COMMENTlink modified 17 months ago • written 17 months ago by Chironex20
1

frida : Have you checked repeatmasker help to see if there is a way to change the repeat nucleotides to lower case (or N's in case of hard masking) when they are found?

ADD REPLYlink modified 17 months ago • written 17 months ago by genomax91k

No i didn't. I can check it

ADD REPLYlink written 17 months ago by Chironex20
1

Is the "position in query begin/end" not telling what you are looking for or am I missing something?

as genomax mentioned , using the hard/soft masking approach (and then analysing the masked genome) will anyway give the exact locations of repeats in the genome. You will have to specifically activate it, since RepeatMasker will not do it by default

ADD REPLYlink modified 17 months ago • written 17 months ago by lieven.sterck8.7k

I have to find this option in help. Anyway, I'm interested in find the position in mygenome.fa that is divided in scaffold, not in query that is R=...

ADD REPLYlink written 17 months ago by Chironex20

we are taking about this RepeatMasker, correct?

ADD REPLYlink written 17 months ago by lieven.sterck8.7k

Hi , i think you have the sequences repeat in output. So you just have to align yours repeat sequences against your genome with Blast for exemple.

Best

ADD REPLYlink written 17 months ago by Titus910

I modified my post with some other information for better explain the problem

ADD REPLYlink written 17 months ago by Chironex20

So you can BLAST mygenome_repeats_filt2.fa against your_genome.fa.

ADD REPLYlink written 17 months ago by Titus910
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 964 users visited in the last hour