Question: Combine/Merge Two Phased Vcf Files For Plink Analysis
gravatar for michealsmith
7.6 years ago by
michealsmith760 wrote:

I have two already phased vcf files(one patient and one control), and would like to merge them together.

I've tried vcf-phased-joint, but it requires the same column, ie. the individual numbers should be equal, which sounds weird. Then I tried GATK -T CombineVariants, and it works!

But my questions are:

  1. Is it OK to simply combine/merge two PHASED vcf together? (The optimal way in my mind is to combine patient and control bams and call vcf together, and phased all SNPs in vcf together using GATK-Readsbackedphasing; but it'll be too painful to process these bam files. Actually controls here are 1000 genome data). I mean after merging there'll be many genotype fields missing, is this OK for downstream plink analysis?

  2. Actually how would plink handle missing genotype as well as unphased genotype?

  3. Should I only use SNP for plink? Or it's OK to include indels as well?

Beginner for plink here, so confused Many many thanks!

plink • 4.2k views
ADD COMMENTlink modified 7.6 years ago by venks720 • written 7.6 years ago by michealsmith760

See these posts: How can I merge a large amount of VCF files? and Combining data of multiple VCFs into one.

ADD REPLYlink written 7.6 years ago by zx87549.9k
gravatar for venks
7.6 years ago by
United States
venks720 wrote:


I have did something like this before for plink analysis. We 've always had multiple sample VCF files for plink analysis. It is better you call the variants for both the samples together so you will not have bias... This should essentially output a multisample vcf file which you can use for further plink analysis.

Good luck

ADD COMMENTlink written 7.6 years ago by venks720
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 988 users visited in the last hour