I have a list of 7000 LTRs predicted using LTR harvest and LTR digest run on a plant genome. I want to do LTR dynamics analysis, therefore I need the elements having two intact LTR to be grouped into families.
Is there a better way to group them into families other than homology based BLAST approach? The sequences are too different from it's nearest relatives in Repbase.
Is it advisable to cluster them before and then work on the protein domains to group them? I am just looking for suggestions.
Thanks a lot