Entering edit mode
7.8 years ago
yanlzhang3
•
0
Hi there, I am working on GWAS analysis using 1000 genome project data, I download the vcf file, and calculated per site Fst with my own dataset and 1000 genome project data. I found a site, chr2:42842851 (hg19): ref C,alt T, almost all samples in my data with allele C and samples in 1000GP with allele T. I guess there is something wrong. So I download one high coverage sample in 1000GP, and call it with samtools+bcftools, and the result is C, not T. Is there anyone know the reasons. Thanks a lot
According to the 1000G browser it seems all samples are T: http://browser.1000genomes.org/Homo_sapiens/Variation/Population?db=core;r=2:42842351-42843351;v=rs376237627;vdb=variation;vf=58531755
Ensembl EVA also reports it as mostly T: http://www.ebi.ac.uk/eva/?Variant%20Browser&species=hsapiens_grch37&selectFilter=snp&snp=rs376237627&studies=PRJEB4019%2CPRJEB6930%2CPRJEB5439%2CPRJEB5829%2CPRJEB8652%2CPRJEB8650%2CPRJEB8639%2CPRJEB6042%2CPRJX00001%2CPRJEB8705%2CPRJEB8661%2CPRJEB7895%2CPRJEB7217%2CPRJEB7218%2CPRJEB6041&id=rs376237627
Which sample did you test? If you open the "Genotypes" tab from the EVA page, you should find the genotypes by sample. @Emily_Ensembl
I used 39 CEU samples in phase3. For example NA06989 in my result is CC, it is different from the result in EVA. Actually almost all of those genotypes in my analysis are CC or CT. It is very strange comparing with published result. My procedures are, download bam files from 1000GP, using hs37d5.fa as reference, and call snp directly using samtools+bcftools. Is there missing stuffs? Here is alignment of HG00759
Ensembl and EVA are different things. EVA is an archive for genetic variants akin to dbSNP. The alternative allele T is the most frequent allele in all 1KG super populations (see the Population genetics page in Ensembl). The sample genotypes is also available. C seems to be seen in Europeans mostly. Can you point us to the files you've downloaded please? It may be worth contacting the 1000 Genomes helpdesk.
Hi, one of the data is http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18525/high_coverage_alignment/NA18525.wgs.ILLUMINA.bwa.CHB.high_cov_pcr_free.20140203.bam; and I use the folllowing command samtools tview -p 2:42842851 http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA18525/high_coverage_alignment/NA18525.wgs.ILLUMINA.bwa.CHB.high_cov_pcr_free.20140203.bam