Question: How are indels reported in the Simons Diversity Project dataset?
0
gravatar for terry.farrah
3.2 years ago by
terry.farrah0 wrote:

We have finished downloading the massive Simons Diversity Project dataset after a year's effort. It consists of one VCF per chromosome for 260 samples. Each VCF includes one line per nucleotide position. Thus, reference calls and no-calls are explicitly reported.

It appears to me that indels are not described in these VCFs. Are they described in other files, or are they just not reported?.

Below I illustrate with a deletion that is present in the pilot release of 25 genomes, but is absent in the VCF we just downloaded for the same sample.

Insertion 10:118916-118915 is seen in sample HGDP01284 in the VCF for the pilot project (to avoid visual clutter, only first 7 fields shown):

tabix HGDP01284.hg19_1000g.10.mod.vcf.gz 10:118915-118919

10 118915       .       A       .       78.14   .

10 118915       .       A       AG      299.55  .

10 118916       .       G       .       78.15   .

10 118917       .       G       .       75.13   .

10 118918       .       G       .       78.11   .

10 118919       .       A       G       8.07    LowQual

 

Here is the same region for the same sample in the VCF provided with the full dataset. I do not see anything to suggest an insertion at 118915:

zcat HGDP01284.10.filtered.vcf.gz | grep -A4 -m1 118915

10 118915       .       A       .       38.99   .       AN=2;BaseCounts=14,0,0,0;DP=14;GC=35.66;MQ=34.80;MQ0=1;FL=-1    GT:DP   0/0:14

10 118916       .       G       .       38.99   .       AN=2;BaseCounts=0,0,14,0;DP=14;GC=35.66;MQ=34.80;MQ0=1;FL=-1    GT:DP   0/0:14

10 118917       .       G       .       35.99   .       AN=2;BaseCounts=0,0,14,0;DP=14;GC=35.66;MQ=34.80;MQ0=1;FL=-1    GT:DP   0/0:14

10 118918       .       G       .       38.99   .       AN=2;BaseCounts=0,0,15,0;DP=15;GC=35.41;MQ=34.95;MQ0=1;FL=-1    GT:DP   0/0:15

10 118919       rs201347354     A       G       41.01   .       AC=1;AF=0.500;AN=2;BaseCounts=11,0,3,0;BaseQRankSum=0.623;DB;DP=15;Dels=0.07;FS=0.000;GC=35.16;HaplotypeScore=7.9923;MLEAC=1;MLEAF=0.500;MQ=34.95;MQ0=1;MQRankSum=-0.934;QD=2.73;ReadPosRankSum=0.311;FL=-1      GT:AD:DP:GQ:PL  0/1:11,3:14:41:41,0,348

Many thanks for any help.

genome • 686 views
ADD COMMENTlink written 3.2 years ago by terry.farrah0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour