Question: Best way to convert VCF to PLINK file format and merge chromosomes?
gravatar for nchuang
5.4 years ago by
United States
nchuang250 wrote:

I am trying to convert the 1000G genotypes into plink format so I can try to run a PCA. 

I used Plink 1.9 to recode all the vcf.gz to binary bed files. Now I am using --merge-list to merge each chromosome together into one file. I am curious if I should be worried about the warnings about multiple positions for variants. If that is an issue why was it not mentioned in the vcf to plink conversion, and how does a rsID have more than one position unless they meant more than one base pair like it was a structural variant? The multiple chromosomes seen I am not so sure what that means unless it is an error?

Also I assume I also merge my case population with the 1kG dataset then prune them by LD. After that I can use plink to make a MDS plot or use GCTA?

Just saw this:!topic/plink2-users/RNztDLWCfB8

I guess those SNPs in 1kG are multi-allelics?

plink • 8.5k views
ADD COMMENTlink modified 2.5 years ago by Kevin Blighe70k • written 5.4 years ago by nchuang250

actually just going back and I saw when I did the vcf to plink conversion it already filters for only biallelic loci so I don't understand how I would get multiallelic sites...

ADD REPLYlink written 5.4 years ago by nchuang250

For multiallelic sites, Plink 1.9 defaults to keeping only the reference allele and the most common alternate allele; any call involving a lower-frequency alt allele is treated as missing data.  If you want such sites to be entirely skipped, you need to add the --biallelic-only flag.

ADD REPLYlink written 5.4 years ago by chrchang5237.7k

I see, so if I understand you correctly, even though it says filtering biallelic it is really just assigning missing data to the third allele? If I use the biallelic-only flag it will just skip that SNV entirely?

I found through browsing around google and your threads the genetics for fun blog which has exactly what I needed. It was not easy to find despite the obvious title, so I'll post it here for future reference:

ADD REPLYlink written 5.4 years ago by nchuang250

Also what are your thoughts on using GATKs VariantsToBinaryPed for vcf to plink?

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by nchuang250
gravatar for Kevin Blighe
2.5 years ago by
Kevin Blighe70k
Republic of Ireland
Kevin Blighe70k wrote:

Follow my tutorial here for best practices on doing this: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2


ADD COMMENTlink written 2.5 years ago by Kevin Blighe70k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1575 users visited in the last hour