I have been mapping chromosome translocation breakpoints in a human dataset. I have made a series of observations relating to sequence features in common amongst these recombinant events. However, I do not know how to generate 'control' values with which to compare my observations. Ensembl carries data relating to numbers of genes and coding sequence, but I would also like to know if it is possible to obtain similar data relating to the proportion of the human genome that is repetitive DNA (Alu etc) and heterochromatin and the frequency of homopolymeric tracts and inverted repeats. Please can anyone direct me to databases/information/software that could help me obtain approximate figures for these values? Thanks, Kate
There is also "rmsk.txt.gz" MySQL dump which gives similar information.
Indeed and that's probably simpler to use.
Thanks so much. I have been using BED tracks based on Repeat masker to annotate my sequence, anyway, so I guess it would make sense to use the same source to extract and calculate frequencies of events.
Thanks for replying,
Kate