So I have a repeat masker output file for a new organism (crustacean). And I want to use the transposable elements in this specie to analyse the piRNAs (using my own sequencing data=short reads).
The problem is I would like to get consensus sequences for transposable elements in this specie, instead of having each position in the genome where there is a transposon. Because if the same transposon exist in 100 copies in the genome I will have it 100 times in Repeatmasker.
Ideally I will like to get to a multifasta file like the ones in Repbase but I am a bit lost about how to use the Repeatmasker output to achieve this.
Any suggestion will be very helpful ! Thanks