Missing Snp Variants In Two 10Mb Blocks On Chr1 And Chr9 In 1000 Genome Data?
1
0
Entering edit mode
10.1 years ago
Zhenyu Zhang ★ 1.2k

I recently want to use Mach to imputate some SNPs based on 1000 Genome data. Surprisingly, as I followed instruction to chunk ref genome into 10Mb blocks for easy handling, two blocks are of no SNPs at all, one on Chr1 130M-140M, the other on Chr9 50M-60M. So I went back to their allelic.Info file and found there are really big regions without any SNP data. My questions are

  1. Is it normal or something wrong with Mach's 1000G phaseI v3 file?
  2. What's special with these regions? difficult to sequence or too much variations?

Tthanks

snp variant 1000genomes • 2.4k views
ADD COMMENT
0
Entering edit mode
10.1 years ago
  1. This is normal.
  2. They're centromeres, which are highly repetitive and therefore difficult to sequence cost-effectively. Genome Reference Consortium build 37, which is what the Mach file is based on, simply has large gaps representing the centromeres. Build 38 attempts to model them.
ADD COMMENT
0
Entering edit mode

Thanks a lot. This makes a lot sense.

ADD REPLY

Login before adding your answer.

Traffic: 1558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6