We identified certain regions in the genome where there is significantly low coverage across individuals. One of those is a high GC content region (70%) so that might be one explanation but I am wondering what may be other reasons why we can't get good sequencing/alignment in this region. This question can be generalized as "what are the characteristics of a region of the genome that would make it hard to sequence and align so a reliable genotype calling can't be performed using Illumina next-gen sequencing". Other platforms have their unique problems but we are interested in Illumina platform related issues.
Some issues we hypothesize to be important are:
Paralogous regions Repeats Segmental Duplications
Do you have other things that we can add to this list of things to check? Thank you