Question: Combine/Merge Two Phased Vcf Files For Plink Analysis
I have two already phased vcf files(one patient and one control), and would like to merge them together.

I've tried vcf-phased-joint, but it requires the same column, ie. the individual numbers should be equal, which sounds weird. Then I tried GATK -T CombineVariants, and it works!

But my questions are:

  1. Is it OK to simply combine/merge two PHASED vcf together? (The optimal way in my mind is to combine patient and control bams and call vcf together, and phased all SNPs in vcf together using GATK-Readsbackedphasing; but it'll be too painful to process these bam files. Actually controls here are 1000 genome data). I mean after merging there'll be many genotype fields missing, is this OK for downstream plink analysis?

  2. Actually how would plink handle missing genotype as well as unphased genotype?

  3. Should I only use SNP for plink? Or it's OK to include indels as well?

Beginner for plink here, so confused Many many thanks!

See these posts: How can I merge a large amount of VCF files? and Combining data of multiple VCFs into one.

ADD REPLYlink written 7.6 years ago by zx87549.9k
I have did something like this before for plink analysis. We 've always had multiple sample VCF files for plink analysis. It is better you call the variants for both the samples together so you will not have bias... This should essentially output a multisample vcf file which you can use for further plink analysis.

Good luck

ADD COMMENTlink written 7.6 years ago by venks720
