Searching on several forums, I can't find how to solve my problem, however I'm sure this has been done before. Here is the problem: I'm mapping on one genome a library of repeat sequences (Transposable Elements, TEs) so one query can have multiple match on the genome. However, at one given genomic position, I can have multiple match of different TE from my library, and I want to sort the output file to only keep the best hit at one given location of the genome.
Below is an example seen in the genome browser: the darker long bar represents the best repeat matching one genomic position. I want to select those one among hits.
Usually, to perform this kind of analysis, RepeatMasker is used, but I'm not totally satisfied by the result I have, and the way it works is a kind of blackbox. I'm considering using sliding window approach to select of at one given base what is the best TE hit among the different possibilities but I have no clue at all how to do that (and no competencies!).
Thanks a lot for your help and advises,