I’m trying to incorporate SNP data from 1000Genomes into my exome data. Since there are no available exome VCF’s, I downloaded the 1000Genomes whole genome sequence data and then just filtered it according to the genomic positions of my variants (obtained from the PLINK bim file). My data is referenced to hg19, so i used the GRCh37 version of the 1000Genomes that is found at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ ("ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz", etc.). However, when I compared the 2 datasets (using PLINK 1.9 to open and filter the VCF's), I was surprised to find only ~25% of my exome variants in the big 1000Genomes WGS (for example: I have 300,000 SNPs in chromosome 1, but only 80,000 of them were found in the 1000Genomes WGS chromosome 1 file). When I used the "exome pull down targets" data (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/exome_pull_down_targets/) to focus my search, I got very similar results. I was looking for some differences, but 75% "missingness" seems not right. Any suggestions?
Question: comparison of exome data to the 1000Genomes WGS data
2.2 years ago by
gabili • 0
gabili • 0 wrote:
ADD COMMENT • link •
Please log in to add an answer.
Powered by Biostar version 2.3.0
Traffic: 1867 users visited in the last hour