29 days ago
giova34 • 0

I'm looking for a tool to search for an unusual criteria of repeats. Specifically, I'm trying to find pseudo-repeats that are:

  1. ~ 1000-2000 bp long
  2. interspersed fairly rarely (~ once every 1-20 Mb) - which means, they cannot be tandemly repeated!
  3. chromosome specific

These repeats need not be completely homologous (70-80% sequence homology is OK). The motivation is that I'd like to develop chromosome-specific DNA-FISH probes that are degenerate - that is, they label multiple regions of the same chromatin fiber.

I gave RepeatMasker a shot but it isn't quite amenable to making this strange search. Recommendations greatly appreciated!


Thanks for accepting my answer. Coming back to the initial task of coloring chromosomes I have the impression that you want to do multicolor FISH (mFISH). You didn't mention your species of interest, but I think for human and some model organism genomes this is long solved, see here. Your best bet might be sub-centromeric sequences.

29 days ago

I cannot give you a comprehensive answer, because I haven't seen this question come up anywhere before and I am not sure you are approaching the problem in the correct way. I am not even sure these types of chromosome specific repeats exist. If you think of any kind of mobile genetic element and how they work, they might have low if any preference for inserting themselves into the same chromosome they came from. You might still be able to test this using a de-novo detection approach outlined as follows:

  1. Divide the genome into chromosomes and run RepeatModeler to generate de-novo repeat families for each chromosome separately
  2. Optionally remove the most frequent and conserved repeat families like Tc1/Mariner, etc. they are rarely going to be specific
  3. BlastN the repeat families from each chromosome against all other repeat families and check for those that have no hits on all other or at least some chromosomes.
  4. Optionally use the repeat families from each chromosome in RepeatMasker to mask all other chromosome and look for those with lowest or zero frequency.

This is likely an exercise in futility, but who knows. Maybe, if some repeats are found in only some chromosomes, a combination approach could work.

Of course there should be enough chromosome-specific sequences in coding regions, so why not use these instead?


