Build consensus sequences from repeat masker output
1
0
Entering edit mode
3.8 years ago

Hi,

So I have a repeat masker output file for a new organism (crustacean). And I want to use the transposable elements in this specie to analyse the piRNAs (using my own sequencing data=short reads).

The problem is I would like to get consensus sequences for transposable elements in this specie, instead of having each position in the genome where there is a transposon. Because if the same transposon exist in 100 copies in the genome I will have it 100 times in Repeatmasker.
Ideally I will like to get to a multifasta file like the ones in Repbase but I am a bit lost about how to use the Repeatmasker output to achieve this.

Any suggestion will be very helpful ! Thanks

repeatmasker Transposable elements • 1.2k views
ADD COMMENT
0
Entering edit mode

I think the easiest way would be to manipulate the coordinates as a bed file and then use bedtools to extract the sequences from the fasta. Once you have the fastas you can get a consensus

ADD REPLY
0
Entering edit mode

Thanks for the comment. I have already extracted the fasta sequences. I guess the way to move forward would be to do some sort of clustering on the sequences but I am not just sure about that.

ADD REPLY
0
Entering edit mode

You should have the name of the repeat, you can start with that and then get a consensus for each group.

ADD REPLY
0
Entering edit mode
3.6 years ago
bioinfo • 0

there is a script shipped with repeatMasker directory will solve your struggle I assume can be found here

repeatMasker/util/queryRepeatDatabase.pl

What you can do is to get from the database all repetitions of the corresponding taxa you are interested in.

apply as below:

util/queryRepeatDatabase.pl -species YourSpecies  > YourSpecies_repetitions.lib
ADD COMMENT

Login before adding your answer.

Traffic: 2356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6