Question

question on snp in 1000genome project

0

Entering edit mode

9.0 years ago

yanlzhang3 • 0

Hi there, I am working on GWAS analysis using 1000 genome project data, I download the vcf file, and calculated per site Fst with my own dataset and 1000 genome project data. I found a site, chr2:42842851 (hg19): ref C,alt T, almost all samples in my data with allele C and samples in 1000GP with allele T. I guess there is something wrong. So I download one high coverage sample in 1000GP, and call it with samtools+bcftools, and the result is C, not T. Is there anyone know the reasons. Thanks a lot

SNP alignment genome • 2.0k views

ADD COMMENT • link 9.0 years ago by yanlzhang3 • 0

0

Entering edit mode

According to the 1000G browser it seems all samples are T: http://browser.1000genomes.org/Homo_sapiens/Variation/Population?db=core;r=2:42842351-42843351;v=rs376237627;vdb=variation;vf=58531755

Ensembl EVA also reports it as mostly T: http://www.ebi.ac.uk/eva/?Variant%20Browser&species=hsapiens_grch37&selectFilter=snp&snp=rs376237627&studies=PRJEB4019%2CPRJEB6930%2CPRJEB5439%2CPRJEB5829%2CPRJEB8652%2CPRJEB8650%2CPRJEB8639%2CPRJEB6042%2CPRJX00001%2CPRJEB8705%2CPRJEB8661%2CPRJEB7895%2CPRJEB7217%2CPRJEB7218%2CPRJEB6041&id=rs376237627

Which sample did you test? If you open the "Genotypes" tab from the EVA page, you should find the genotypes by sample. @Emily_Ensembl

ADD REPLY • link 9.0 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

I used 39 CEU samples in phase3. For example NA06989 in my result is CC, it is different from the result in EVA. Actually almost all of those genotypes in my analysis are CC or CT. It is very strange comparing with published result. My procedures are, download bam files from 1000GP, using hs37d5.fa as reference, and call snp directly using samtools+bcftools. Is there missing stuffs? Here is alignment of HG00759

ADD REPLY • link 9.0 years ago by yanlzhang3 • 0

0

Entering edit mode

Ensembl and EVA are different things. EVA is an archive for genetic variants akin to dbSNP. The alternative allele T is the most frequent allele in all 1KG super populations (see the Population genetics page in Ensembl). The sample genotypes is also available. C seems to be seen in Europeans mostly. Can you point us to the files you've downloaded please? It may be worth contacting the 1000 Genomes helpdesk.

ADD REPLY • link 9.0 years ago by Denise CS ★ 5.2k

0

Entering edit mode

Hi, one of the data is http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18525/high_coverage_alignment/NA18525.wgs.ILLUMINA.bwa.CHB.high_cov_pcr_free.20140203.bam; and I use the folllowing command samtools tview -p 2:42842851 http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18525/high_coverage_alignment/NA18525.wgs.ILLUMINA.bwa.CHB.high_cov_pcr_free.20140203.bam

ADD REPLY • link 9.0 years ago by yanlzhang3 • 0