I am trying to get a comprehensive list of simple repeats (mono-, di-, tri-, tetra-) in the human genome (hg19). I have downloaded the simpleRepeat.txt.gz from UCSC, but seems it is missing some of the repeats we are interested in. For example, chr1:981861-981868[CCCCCCCC], chr1:1116223-1116230[GGGGGGGG] are some mono nucleotide repeats we are interested in looking at, but they are not on the UCSC list. Thus, I was trying to generate a list using TRF, but still, some of the repeats I was trying to get did not get reported by TRF, e.g., chr1:981861-981868[CCCCCCCC], with the default parameters. Can someone provide some insights here:
- Is there any place where I can download a really 'comprehensive' simple repeats list from?
- If no to question #1, what would be the best way to curate such a list? Is running tools like TRF or RepeatMasker a good idea?
- If TRF is something you would suggest, how should I make it report these mono-nucleotide repeats that I was missiong with the default parameters?