Hi Biostars,
I would like to calculate LD of genome-wide significant variants found within the most recent schizophrenia GWAS paper (see Supp Table 2) using the newest release of 1000 genomes: 20130502 release found here. Note that this is not the version of 1000 genomes that was used for imputation in this paper but is a newer version.
Some variants which are genome-wide significant are not found in the 20130502 release, but are found in the older release 20110521. For example, this variant chr2_200825237_I, is found in release 20110521:
tabix -p vcf ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/ALL.chr2.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz 2:200825236-200825237 | awk '{print $1,$2,$3,$4}'
2 200825237 rs199532108 A
but is not found in the newer release:
tabix -p vcf ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz 2:200825236-200825237 | awk '{print $1,$2,$3,$4}'
This last command returns nothing (no variants found in that range). I don't understand how a variant can be found in a smaller sample (older version) but not found in the larger and overlapping sample (newer version). Any help would be greatly appreciated!
Thanks,
Jason
Thanks! I guess this begs the question whether this SNP is a technical artifact. And if it is, is the case v control signal observed at this SNP also a technical artifact? Perhaps a result of poor imputation?