Question: Missing Snp Variants In Two 10Mb Blocks On Chr1 And Chr9 In 1000 Genome Data?
0
gravatar for Zhenyu Zhang
5.7 years ago by
Zhenyu Zhang260
United States
Zhenyu Zhang260 wrote:

I recently want to use Mach to imputate some SNPs based on 1000 Genome data. Surprisingly, as I followed instruction to chunk ref genome into 10Mb blocks for easy handling, two blocks are of no SNPs at all, one on Chr1 130M-140M, the other on Chr9 50M-60M. So I went back to their allelic.Info file and found there are really big regions without any SNP data. My questions are

  1. Is it normal or something wrong with Mach's 1000G phaseI v3 file?
  2. What's special with these regions? difficult to sequence or too much variations?

Tthanks

1000genomes variant snp • 1.5k views
ADD COMMENTlink modified 2.7 years ago by Biostar ♦♦ 20 • written 5.7 years ago by Zhenyu Zhang260
0
gravatar for chrchang523
5.7 years ago by
chrchang5235.8k
United States
chrchang5235.8k wrote:
  1. This is normal.
  2. They're centromeres, which are highly repetitive and therefore difficult to sequence cost-effectively. Genome Reference Consortium build 37, which is what the Mach file is based on, simply has large gaps representing the centromeres. Build 38 attempts to model them.
ADD COMMENTlink written 5.7 years ago by chrchang5235.8k

Thanks a lot. This makes a lot sense.

ADD REPLYlink written 5.7 years ago by Zhenyu Zhang260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1932 users visited in the last hour