Question: Best way to convert VCF to PLINK file format and merge chromosomes?
3
gravatar for nchuang
4.2 years ago by
nchuang200
United States
nchuang200 wrote:

I am trying to convert the 1000G genotypes into plink format so I can try to run a PCA. 

I used Plink 1.9 to recode all the vcf.gz to binary bed files. Now I am using --merge-list to merge each chromosome together into one file. I am curious if I should be worried about the warnings about multiple positions for variants. If that is an issue why was it not mentioned in the vcf to plink conversion, and how does a rsID have more than one position unless they meant more than one base pair like it was a structural variant? The multiple chromosomes seen I am not so sure what that means unless it is an error?

Also I assume I also merge my case population with the 1kG dataset then prune them by LD. After that I can use plink to make a MDS plot or use GCTA?

Just saw this: https://groups.google.com/forum/#!topic/plink2-users/RNztDLWCfB8

I guess those SNPs in 1kG are multi-allelics?

plink • 6.5k views
ADD COMMENTlink modified 14 months ago by Kevin Blighe51k • written 4.2 years ago by nchuang200

actually just going back and I saw when I did the vcf to plink conversion it already filters for only biallelic loci so I don't understand how I would get multiallelic sites...

ADD REPLYlink written 4.2 years ago by nchuang200
1

For multiallelic sites, Plink 1.9 defaults to keeping only the reference allele and the most common alternate allele; any call involving a lower-frequency alt allele is treated as missing data.  If you want such sites to be entirely skipped, you need to add the --biallelic-only flag.

ADD REPLYlink written 4.2 years ago by chrchang5235.8k
1

I see, so if I understand you correctly, even though it says filtering biallelic it is really just assigning missing data to the third allele? If I use the biallelic-only flag it will just skip that SNV entirely?

I found through browsing around google and your threads the genetics for fun blog which has exactly what I needed. It was not easy to find despite the obvious title, so I'll post it here for future reference:

http://apol1.blogspot.com/2014/11/best-practice-for-converting-vcf-files.html

ADD REPLYlink written 4.2 years ago by nchuang200
1

Also what are your thoughts on using GATKs VariantsToBinaryPed for vcf to plink?

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by nchuang200
0
gravatar for Kevin Blighe
14 months ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

Follow my tutorial here for best practices on doing this: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

Kevin

ADD COMMENTlink written 14 months ago by Kevin Blighe51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1621 users visited in the last hour