Question

Composition of low mappable region in human genome

0

Entering edit mode

5.1 years ago

CY ▴ 750

I am trying to figure out the reasons that contribute to low mappable region in human genome. As far as I can think of, repetitive region and homologous region can be the causes of such region. I am wondering if any other causes that may contribute to this. Thanks

mappability repetitive region homologous region • 1.2k views

ADD COMMENT • link updated 5.0 years ago by Biostar 20 • written 5.1 years ago by CY ▴ 750

0

Entering edit mode

Also low complexity regions. In general this is a good source for the information about your problem: http://www.repeatmasker.org/webrepeatmaskerhelp.html

ADD REPLY • link 5.1 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

I think LCR and repetitive region are used interchangeably.

ADD REPLY • link 5.1 years ago by CY ▴ 750

0

Entering edit mode

Well, in the doc I've sent it is explained why it is not true =)

"By default, along with the interspersed repeats, RepeatMasker masks low-complexity DNA. Simple repeats (micro-satellites) can originate at any site in the genome, and therefore have an interspersed character. Other low-complexity DNA, primarily poly-purine/ poly-pyrimidine stretches, or regions of extremely high AT or GC content will result in spurious matches in some database searches as well (especially in the ungapped BLASTN searches). For example, extremely AT-rich regions consistently will give very low probability matches to mitochondrial DNA in BLASTN searches. The settings are very stringent, and we think that few if any sequences informative in database searches are masked as low-complexity DNA. However, one may opt to skip the low-complexity masking, for example when using RepeatMasker in conjunction with a gene prediction program."

ADD REPLY • link 5.1 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Got it. So by its definition, low complexity region belongs to region of continous repeat as oppose to interspersed repeats, right?

ADD REPLY • link 5.1 years ago by CY ▴ 750

1

Entering edit mode

TTTTATTATAAATTAATAAAAATTATATATATATAAAATTTAAAAATTATA - this is an example of the low complexity region, its GC content is literally 0. You can look at it from the point of view of "repeats", but the question is "what is actually repeating here". I think here is the better explanation: Complex Genomic Regions

ADD REPLY • link 5.1 years ago by German.M.Demidov ★ 2.9k