Question: How To Extract A Specific Population Vcf File From 1000G Genotypes Vcf File
3
gravatar for J.F.Jiang
6.5 years ago by
J.F.Jiang750
China
J.F.Jiang750 wrote:

I have downloaded the 20101123 version RAW genotypes data encoded in VCF format. And I want to use plink to calculate the LD relation with my snp list.

The vcftools offered us a method to convert the vcf genotypes to plink ped format while not provide a method to extract one population data.

The VCFtoped perl script offered by 1000G can not extract all the chr data just within a defined region, and besides the info file is something different with the .map file of plink, missing chr column.

So is there any existing method to extract all genotypes in VCF format of CEU population?

If you know such a method, could you tell me how?

Thank you!

Best for all!

vcf genotyping • 5.8k views
ADD COMMENTlink written 6.5 years ago by J.F.Jiang750
1
gravatar for Adam
6.5 years ago by
Adam980
United States
Adam980 wrote:

Create a file listing all CEU individuals in the 1000G, and then use:

./vcftools --vcf <vcf_file> --keep CEUlist.txt --out outputfile_prefix --plink

Should do what you want.

ADD COMMENTlink written 6.5 years ago by Adam980

That is great, it works. And another question is that the 1000G pilot1 offered us a genotypes encoded in VCF3.3 version, while the vcftools requires a version higher than 4, so how can i convert the version of vcf files.

ADD REPLYlink written 6.5 years ago by J.F.Jiang750

You must be using an older version of VCFtools. The later versions work with VCF versions 4 and higher.

ADD REPLYlink written 6.5 years ago by Adam980

I am using the latest version of vcstools, which can handle the v.4 vcf files. What I am saying is that the vcf file is coded in v.3.3 format that the tools can not process with it. Error:VCF version must be v4.0 or v4.1: You are using version VCFv3.3

ADD REPLYlink written 6.5 years ago by J.F.Jiang750

My mistake, I miss the function in vcftools that is vcf-convert

ADD REPLYlink modified 6.5 years ago • written 6.5 years ago by J.F.Jiang750
0
gravatar for J.F.Jiang
6.5 years ago by
J.F.Jiang750
China
J.F.Jiang750 wrote:

The present stupid method I can use is that using the vcf-subset encoded in vcftools, and extract all symbols of CEU in reference panel, and then use vcf-subset -c LABLE xxxxgenotypes.vcf.gz > xxxx.genotype.ceu.vcf.gz

It is still in processing, and do not know whether the command is right or not. And this method is not clever enough for a bioinformatical person.

So if you know any better solution, please tell me.

ADD COMMENTlink written 6.5 years ago by J.F.Jiang750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1218 users visited in the last hour