Hi! we are going to process some NGS data in order to find splice junctions. The low complexity and repetitive sequences into reads often generates false positive splice junctions. For this reason I need to remove all the reads which have low complexities sequences.
Maybe the most common way to do it's using RepeatMasker or Dustmasker, but I think it'll take very long time (because I have a very large NGS data). Another option, is map the reads (with BFAST or another mapper) to Repbase and take only unmapped reads. I thing bowtie isn't an option because I want a sensible filter.
Thanks for your time, I'll wait for your sugestions.