Question: Using VCFtools to find which population codes individuals in a vcf file belong to
0
gravatar for severalorks
4.3 years ago by
severalorks110
severalorks110 wrote:

EDIT: The population code for each individual is listed elsewhere and is found on the 1000Genomes site, so the question's essentially been answered

I've been looking through phased vcf data from 1000 Genomes, specifically these files: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/

This has been a helpful resource so far for deciphering vcf format: https://samtools.github.io/hts-specs/VCFv4.2.pdf

However, I still have questions about identifying which population code each individual in the data belongs to. For example, what population code does individual HG00096 belong to? Is it possible to find all individuals from group GBR? In column INFO, it lists the allele frequency for each super population for each of the recorded positions, though I'm looking for information about the more specific population codes.

I think VCFtools may allow me to accomplish this, I've looked through the manual here: https://vcftools.github.io/man_latest.html I haven't found the answer to my question yet, but I'm still searching through it.

So how can I use VCFtools to find which population codes individuals in a vcf file belong to? If VCFtools can't do this, where else can I get this information?

ADD COMMENTlink modified 4.3 years ago by Vincent Laufer1.1k • written 4.3 years ago by severalorks110
2
gravatar for Vincent Laufer
4.3 years ago by
Vincent Laufer1.1k
United States
Vincent Laufer1.1k wrote:

Hello,

The information is not contained in the VCF file itself, unless you extract it from the genomic information ...

Rather, the mapping is contained in a separate file. First, have a look here: http://www.1000genomes.org/category/population/ You might also look here for an alternative way to extract subpopulations of a given type: http://browser.1000genomes.org/Homo_sapiens/UserData/SelectSlice

Now then, the mapping you are looking for can be found here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_sample_info.xlsx which can be accessed from this page: http://www.1000genomes.org/faq/can-i-get-phenotype-gender-and-family-relationship-information-samples/

You mention VCF tools. A next step could be to extract only those samples and create a smaller VCF file, or to run whatever analysis you wanted on that subset using VCF tools.

I recently did something similar but using Plink2, then Plink1, if you would like that code for reference I can append it. Does this answer your question?

ADD COMMENTlink written 4.3 years ago by Vincent Laufer1.1k
1

Yes, thank you. I'd already found the population codes elsewhere but your additional information was very helpful too.

ADD REPLYlink written 4.3 years ago by severalorks110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1606 users visited in the last hour