question on snp in 1000genome project
0
0
Entering edit mode
7.8 years ago
yanlzhang3 • 0

Hi there, I am working on GWAS analysis using 1000 genome project data, I download the vcf file, and calculated per site Fst with my own dataset and 1000 genome project data. I found a site, chr2:42842851 (hg19): ref C,alt T, almost all samples in my data with allele C and samples in 1000GP with allele T. I guess there is something wrong. So I download one high coverage sample in 1000GP, and call it with samtools+bcftools, and the result is C, not T. Is there anyone know the reasons. Thanks a lot

SNP alignment genome • 1.7k views
ADD COMMENT
0
Entering edit mode

I used 39 CEU samples in phase3. For example NA06989 in my result is CC, it is different from the result in EVA. Actually almost all of those genotypes in my analysis are CC or CT. It is very strange comparing with published result. My procedures are, download bam files from 1000GP, using hs37d5.fa as reference, and call snp directly using samtools+bcftools. Is there missing stuffs? Here is alignment of HG00759 Here is alignment of HG00759

ADD REPLY
0
Entering edit mode

Ensembl and EVA are different things. EVA is an archive for genetic variants akin to dbSNP. The alternative allele T is the most frequent allele in all 1KG super populations (see the Population genetics page in Ensembl). The sample genotypes is also available. C seems to be seen in Europeans mostly. Can you point us to the files you've downloaded please? It may be worth contacting the 1000 Genomes helpdesk.

ADD REPLY

Login before adding your answer.

Traffic: 2662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6