Question: Missing Snp Variants In Two 10Mb Blocks On Chr1 And Chr9 In 1000 Genome Data?
6.7 years ago by
Zhenyu Zhang270
United States
Zhenyu Zhang270 wrote:

I recently want to use Mach to imputate some SNPs based on 1000 Genome data. Surprisingly, as I followed instruction to chunk ref genome into 10Mb blocks for easy handling, two blocks are of no SNPs at all, one on Chr1 130M-140M, the other on Chr9 50M-60M. So I went back to their allelic.Info file and found there are really big regions without any SNP data. My questions are

  1. Is it normal or something wrong with Mach's 1000G phaseI v3 file?
  2. What's special with these regions? difficult to sequence or too much variations?


1000genomes variant snp • 1.6k views
ADD COMMENTlink modified 3.7 years ago by Biostar ♦♦ 20 • written 6.7 years ago by Zhenyu Zhang270
6.7 years ago by
United States
chrchang5237.4k wrote:
  1. This is normal.
  2. They're centromeres, which are highly repetitive and therefore difficult to sequence cost-effectively. Genome Reference Consortium build 37, which is what the Mach file is based on, simply has large gaps representing the centromeres. Build 38 attempts to model them.
ADD COMMENTlink written 6.7 years ago by chrchang5237.4k

Thanks a lot. This makes a lot sense.

ADD REPLYlink written 6.7 years ago by Zhenyu Zhang270
