Entering edit mode
5.1 years ago
CY
▴
750
I am trying to figure out the reasons that contribute to low mappable region in human genome. As far as I can think of, repetitive region and homologous region can be the causes of such region. I am wondering if any other causes that may contribute to this. Thanks
Also low complexity regions. In general this is a good source for the information about your problem: http://www.repeatmasker.org/webrepeatmaskerhelp.html
I think LCR and repetitive region are used interchangeably.
Well, in the doc I've sent it is explained why it is not true =)
"By default, along with the interspersed repeats, RepeatMasker masks low-complexity DNA. Simple repeats (micro-satellites) can originate at any site in the genome, and therefore have an interspersed character. Other low-complexity DNA, primarily poly-purine/ poly-pyrimidine stretches, or regions of extremely high AT or GC content will result in spurious matches in some database searches as well (especially in the ungapped BLASTN searches). For example, extremely AT-rich regions consistently will give very low probability matches to mitochondrial DNA in BLASTN searches. The settings are very stringent, and we think that few if any sequences informative in database searches are masked as low-complexity DNA. However, one may opt to skip the low-complexity masking, for example when using RepeatMasker in conjunction with a gene prediction program."
Got it. So by its definition, low complexity region belongs to region of continous repeat as oppose to interspersed repeats, right?
TTTTATTATAAATTAATAAAAATTATATATATATAAAATTTAAAAATTATA - this is an example of the low complexity region, its GC content is literally 0. You can look at it from the point of view of "repeats", but the question is "what is actually repeating here". I think here is the better explanation: Complex Genomic Regions