8.7 years ago by
Athens, GA, USA
First, the centromeres are not included in the reference human genome assembly. Te are unclonable and unsequencable by traditional genome sequencing methods. This is why you have a blank patch of no reads mapping in the center chromosome 7 (a metacentric -- "middle center" -- chromosome). The centromere is a physical gap in the assembly and is represented as a long stretch of NNNN's.
What you are actually seeing is reads preferentially mapping to the peri-centormeric region. In many species, the pericentromeric region is enriched in transposable elements and simple sequence repeats, because these regions are gene-poor and/or have low recombination rates. Therefore, what is probably going on here is that you have reads mapping to repetitive sequences in the peri-centromeric region. Why this would happening with BWA on defaults, which should map repetitive reads randomly to one of the repeats, is a bit curious. Neverlethess, to see if this solves your problem, try filtering to remove all repetitive reads on the XT tag of the BWA output.