Question: question on snp in 1000genome project
0
gravatar for yanlzhang3
3.4 years ago by
yanlzhang30
yanlzhang30 wrote:

Hi there, I am working on GWAS analysis using 1000 genome project data, I download the vcf file, and calculated per site Fst with my own dataset and 1000 genome project data. I found a site, chr2:42842851 (hg19): ref C,alt T, almost all samples in my data with allele C and samples in 1000GP with allele T. I guess there is something wrong. So I download one high coverage sample in 1000GP, and call it with samtools+bcftools, and the result is C, not T. Is there anyone know the reasons. Thanks a lot

snp alignment genome • 933 views
ADD COMMENTlink written 3.4 years ago by yanlzhang30

According to the 1000G browser it seems all samples are T: http://browser.1000genomes.org/Homo_sapiens/Variation/Population?db=core;r=2:42842351-42843351;v=rs376237627;vdb=variation;vf=58531755

Ensembl EVA also reports it as mostly T: http://www.ebi.ac.uk/eva/?Variant%20Browser&species=hsapiens_grch37&selectFilter=snp&snp=rs376237627&studies=PRJEB4019%2CPRJEB6930%2CPRJEB5439%2CPRJEB5829%2CPRJEB8652%2CPRJEB8650%2CPRJEB8639%2CPRJEB6042%2CPRJX00001%2CPRJEB8705%2CPRJEB8661%2CPRJEB7895%2CPRJEB7217%2CPRJEB7218%2CPRJEB6041&id=rs376237627

Which sample did you test? If you open the "Genotypes" tab from the EVA page, you should find the genotypes by sample. @Emily_Ensembl

ADD REPLYlink written 3.4 years ago by Giovanni M Dall'Olio26k

I used 39 CEU samples in phase3. For example NA06989 in my result is CC, it is different from the result in EVA. Actually almost all of those genotypes in my analysis are CC or CT. It is very strange comparing with published result. My procedures are, download bam files from 1000GP, using hs37d5.fa as reference, and call snp directly using samtools+bcftools. Is there missing stuffs? Here is alignment of HG00759 Here is alignment of HG00759

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by yanlzhang30

Ensembl and EVA are different things. EVA is an archive for genetic variants akin to dbSNP. The alternative allele T is the most frequent allele in all 1KG super populations (see the Population genetics page in Ensembl). The sample genotypes is also available. C seems to be seen in Europeans mostly. Can you point us to the files you've downloaded please? It may be worth contacting the 1000 Genomes helpdesk.

ADD REPLYlink written 3.4 years ago by Denise - Open Targets5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1173 users visited in the last hour