How to obtain subpopulation specific allelic variants from a .vcf file using PLINK?
2
0
Entering edit mode
4 months ago

I have downloaded the .vcf.gz files from the 1000 Genomes FTP. I am trying to study the genetic variation within a subpopulation for a particular gene. I am a new user to plink and is not acquainted with all the functions. I have converted the .vcf files to .bed, .bid and .fam formats. But I am not sure if I the subpopulation IDs are present in the file or do I have to download the .ped files from the 1000 Genomes website. Any insight on obtaining the allelic variants for a subpopulation is appreciated.

.vcf plink 1000genomes subpopulation • 277 views
0
Entering edit mode
7 weeks ago
Rashmi ▴ 20

You can use the VCF to PED converter tool (online and perl tool available) provided by 1000 Genome to convert the VCF to PED format. https://www.internationalgenome.org/vcf-ped-converter/#api-script

In the script you can specify the population you want, and thus, get the data just for the specific population.

perl vcf_to_ped_converter.pl -vcf chr13.vcf.gz -sample_panel_file ALL.sample_panel -region 13:32889611-32973805 -population GBR


0
Entering edit mode
7 weeks ago
1. Subpopulation IDs are not present in the .vcf.gz files. You must download either their .ped (note that that is NOT a "plink .ped"-formatted file, despite the identical extension), or another resource that contains the same information (e.g. https://www.cog-genomics.org/plink/2.0/resources#1kg_phase3 ).

2. plink's --keep flag provides a simple way to look at a subpopulation. If you use the plink2-formatted files with subpopulation IDs built-in, --keep-if is more direct.

3. Most scripts involving plink .ped files are obsolete in 2021. plink has been able to read and write VCF files more efficiently than .ped files since 2014. As of this writing, plink 2.0 can't read/write .ped files at all. (In contrast, .bed+.bim+.fam works well with all plink versions.)