I have very large repeat arrays assembled from several samples and the idea is to try find novel material that has been inserted in these arrays. The material only has to be novel compared to simple tandemly repeated array without any insertion events.
I have tried some alignment methods (using a reference based repeat array and looking for gaps in the alignment or absence of coverage) but due to the large number of repeats and degeneracy, they have their flaws and require further refinement downstream of the locations of interest.
I was wondering if anyone has some simple ideas to solve this problem? Is there a simple way to find all regions not similar to a given sequence (one repeat)? I feel like there must be something but I am currently drawing a blank
Thanks in advance.