I just wonder how can I quantify the amount of repeat sequence across a fasta file .
For exemple :
This sequence contains a repeat of ACT in more than 40% of the total sequence .
I m looking on fourrier transformation and entropy .
Do you have other idea ?
This will tell you the fraction of the genome with entropy below 0.7 when using a window of 80 and 5-mers. The code for the entropy masking of a sequence is in /bbmap/current/jgi/BBMask.java at lines 1067-1144 and the entropy calculation over a window is very short, at 1158-1168.