Repetitive elements on my genome
0
0
Entering edit mode
3.6 years ago
Chironex ▴ 50

I'm analyzing mygenome.fa with RepeatScout and RepeatMasker to find transposable elements. I produced a library of Repetitive elements with repeatscout and the I masked TE using Repeatmasker.

./build_lmer_table -l 14 -sequence mygenome.fa -freq mygenome.freq
./RepeatScout -sequence mygenome.fa -output mygenome_repeats.fa -freq mygenome.freq -l 14
cat mygenome_repeats.fa| ./filter-stage-1.prl >mygenome_repeats_filtered1.fasta


Generate a masked genome using (non-low-complexity, non-tandem) repeats

cat mygenome_repeats_filtered1.fa | ./filter-stage-2.prl --cat mygenome.fa.out >mygenome_repeats_filtered2.fa


Filter out all (non-low-complexity, non-tandem) repeats that have less than 10 repeats

./RepeatMasker -pa 4 -s -lib mygenome_repeats_filtered2.fa -nolow -norna -no_is -gff mygenome.fa


./RepeatMasker no_is mygenome_repeats_filtered2.fa


and I produced:  mygenome_repeats_filt2.fa.masked mygenome_repeats_filt2.fa.tbl mygenome_repeats_filt2.fa.cat mygenome_repeats_filt2.fa.out 

I would to know a way to find the location of transposable elements masked by RepeatMasker on my genome.

Supposing that this is a part of my Repetitive elements library produced with RepeatScout as I showed above:

less mygenome_repeats_filt2.fa

>R=3 (RR=4.  TRF=0.000 NSEG=0.000)
TAAGGCGGCGAGCTGGCAGAATCGTTAGCACGCCGGGCGAAATGCTTAGCGGTATTTCGTCTGTCTTTACGTTCTGAGTT
CAAATTCCGCCGAGGTCGACTTTGCCTTTCATCCTTTCGGGGTCGATAAAATAAGTACCAGTTGAGCACTGGGGTCGATG
TAATCGACTTACCCCCTCCCCCAAAATTTCTGGCCTTGTGCCTATATTAGAAACGATTATT
>R=4 (RR=5.  TRF=0.122 NSEG=0.226)
ACACACACACACACACACACACACACATATATATATATATACATATATACGACGGGCTTCTTTCAGTTTCCGTCTACCAA
ATCCACTCACAAGGCTTTGGTCGGCCCGAGGCTATAGTAGAAGACACTTGCCCAAGGTGCCACGCAGTGGGACTGAACCC
GGAACCATGTGGTTGGTAAGCAAGCTACTTACCACACAGCCACTCCTGCGCCTATATATAT
>R=6 (RR=7.  TRF=0.134 NSEG=0.247)
TTGTTTCAGTCATTTGACTGCGGCCATGCTGGAGCACCGCCTTTAGTCGAGCAAATCGACCCCAGGACTTATTCTTTGTA
AGCCTAGTACTTATTCTATCGGTCTCTTTTGCCGAACCGCTAAGTTACGGGGACGTAAACACACCAGCATCGGTTGTCAA
GCGATGTTGGGGGGACAAACACAGACACACAAACACACACACACACATACATATATATATATATATATATA


from the file mygenome_repeats_filt2.fa.out I can see that in R=3 there is a trasposable element:

   SW   perc perc perc  query     position in query    matching        repeat              position in repeat
score   div. del. ins.  sequence  begin end   (left)   repeat          class/family      begin  end    (left)  ID

348   20.6  0.0  0.0  R=3          78   140   (81) + AmnSINE2        SINE/tRNA-Deu         67    129  (229) 234


As you can see I have the coordinates to find this element in the library, but I would to find its exactly in the file that contain my assembled genome.

transposon • 1.1k views
1
Entering edit mode

frida : Have you checked repeatmasker help to see if there is a way to change the repeat nucleotides to lower case (or N's in case of hard masking) when they are found?

0
Entering edit mode

No i didn't. I can check it

1
Entering edit mode

Is the "position in query begin/end" not telling what you are looking for or am I missing something?

as genomax mentioned , using the hard/soft masking approach (and then analysing the masked genome) will anyway give the exact locations of repeats in the genome. You will have to specifically activate it, since RepeatMasker will not do it by default

0
Entering edit mode

I have to find this option in help. Anyway, I'm interested in find the position in mygenome.fa that is divided in scaffold, not in query that is R=...

0
Entering edit mode

0
Entering edit mode

Hi , i think you have the sequences repeat in output. So you just have to align yours repeat sequences against your genome with Blast for exemple.

Best

0
Entering edit mode

I modified my post with some other information for better explain the problem

0
Entering edit mode

So you can BLAST mygenome_repeats_filt2.fa against your_genome.fa.