Hi,
I downloaded what I think is the latest set of SNP variant calls for the European samples from the 1000 genomes here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting/EUR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz
I used tabix to download the data and vcftools to convert to a matrix of 0, 1, 2 and -1 for missing genotype.
I'm interested in SNPs in the exons of a particular gene, the thing is, of 13,000 SNPs that there is data, there are a huge number of missing values, nearly 5,000 for which genotypes is missing.
Does anyone have any idea why there are so many missing genotypes in this data?
Thanks!
are the genotypes missing for all samples at a given site, or are you saying that of the 13000 SNPs, 5000 have at least one missing genotype?
Hi, in most cases the genotypes seem to be missing in all samples at a given site.