Question

Repetitive elements on my genome

0

Entering edit mode

5.0 years ago

Chironex ▴ 40

I'm analyzing mygenome.fa with RepeatScout and RepeatMasker to find transposable elements. I produced a library of Repetitive elements with repeatscout and the I masked TE using Repeatmasker.

./build_lmer_table -l 14 -sequence mygenome.fa -freq mygenome.freq
./RepeatScout -sequence mygenome.fa -output mygenome_repeats.fa -freq mygenome.freq -l 14
cat mygenome_repeats.fa| ./filter-stage-1.prl >mygenome_repeats_filtered1.fasta
./RepeatMasker -s -lib mygenome_repeats_filtered1.fa mygenome.fa

Generate a masked genome using (non-low-complexity, non-tandem) repeats

cat mygenome_repeats_filtered1.fa | ./filter-stage-2.prl --cat mygenome.fa.out >mygenome_repeats_filtered2.fa

Filter out all (non-low-complexity, non-tandem) repeats that have less than 10 repeats

./RepeatMasker -pa 4 -s -lib mygenome_repeats_filtered2.fa -nolow -norna -no_is -gff mygenome.fa

After this I ran Repeatmasker:

./RepeatMasker no_is mygenome_repeats_filtered2.fa

and I produced: mygenome_repeats_filt2.fa.masked mygenome_repeats_filt2.fa.tbl mygenome_repeats_filt2.fa.cat mygenome_repeats_filt2.fa.out

I would to know a way to find the location of transposable elements masked by RepeatMasker on my genome.

Supposing that this is a part of my Repetitive elements library produced with RepeatScout as I showed above:

less mygenome_repeats_filt2.fa

>R=3 (RR=4.  TRF=0.000 NSEG=0.000)
TAAGGCGGCGAGCTGGCAGAATCGTTAGCACGCCGGGCGAAATGCTTAGCGGTATTTCGTCTGTCTTTACGTTCTGAGTT
CAAATTCCGCCGAGGTCGACTTTGCCTTTCATCCTTTCGGGGTCGATAAAATAAGTACCAGTTGAGCACTGGGGTCGATG
TAATCGACTTACCCCCTCCCCCAAAATTTCTGGCCTTGTGCCTATATTAGAAACGATTATT
>R=4 (RR=5.  TRF=0.122 NSEG=0.226)
ACACACACACACACACACACACACACATATATATATATATACATATATACGACGGGCTTCTTTCAGTTTCCGTCTACCAA
ATCCACTCACAAGGCTTTGGTCGGCCCGAGGCTATAGTAGAAGACACTTGCCCAAGGTGCCACGCAGTGGGACTGAACCC
GGAACCATGTGGTTGGTAAGCAAGCTACTTACCACACAGCCACTCCTGCGCCTATATATAT
>R=6 (RR=7.  TRF=0.134 NSEG=0.247)
TTGTTTCAGTCATTTGACTGCGGCCATGCTGGAGCACCGCCTTTAGTCGAGCAAATCGACCCCAGGACTTATTCTTTGTA
AGCCTAGTACTTATTCTATCGGTCTCTTTTGCCGAACCGCTAAGTTACGGGGACGTAAACACACCAGCATCGGTTGTCAA
GCGATGTTGGGGGGACAAACACAGACACACAAACACACACACACACATACATATATATATATATATATATA

from the file mygenome_repeats_filt2.fa.out I can see that in R=3 there is a trasposable element:

   SW   perc perc perc  query     position in query    matching        repeat              position in repeat
score   div. del. ins.  sequence  begin end   (left)   repeat          class/family      begin  end    (left)  ID

348   20.6  0.0  0.0  R=3          78   140   (81) + AmnSINE2        SINE/tRNA-Deu         67    129  (229) 234

As you can see I have the coordinates to find this element in the library, but I would to find its exactly in the file that contain my assembled genome.

transposon • 1.7k views

ADD COMMENT • link 5.0 years ago by Chironex ▴ 40

1

Entering edit mode

frida : Have you checked repeatmasker help to see if there is a way to change the repeat nucleotides to lower case (or N's in case of hard masking) when they are found?

ADD REPLY • link 5.0 years ago by GenoMax 141k

0

Entering edit mode

No i didn't. I can check it

ADD REPLY • link 5.0 years ago by Chironex ▴ 40

1

Entering edit mode

Is the "position in query begin/end" not telling what you are looking for or am I missing something?

as genomax mentioned , using the hard/soft masking approach (and then analysing the masked genome) will anyway give the exact locations of repeats in the genome. You will have to specifically activate it, since RepeatMasker will not do it by default

ADD REPLY • link 5.0 years ago by lieven.sterck 15k

0

Entering edit mode

I have to find this option in help. Anyway, I'm interested in find the position in mygenome.fa that is divided in scaffold, not in query that is R=...

ADD REPLY • link 5.0 years ago by Chironex ▴ 40

0

Entering edit mode

we are taking about this RepeatMasker, correct?

ADD REPLY • link 5.0 years ago by lieven.sterck 15k

0

Entering edit mode

Hi , i think you have the sequences repeat in output. So you just have to align yours repeat sequences against your genome with Blast for exemple.

Best

ADD REPLY • link 5.0 years ago by Titus ▴ 910

0

Entering edit mode

I modified my post with some other information for better explain the problem

ADD REPLY • link 5.0 years ago by Chironex ▴ 40

0

Entering edit mode

So you can BLAST mygenome_repeats_filt2.fa against your_genome.fa.

ADD REPLY • link 5.0 years ago by Titus ▴ 910