Question

Can't find the "classic" error in merged vcf files with vcf-merge.

0

Entering edit mode

7.0 years ago

msimmer92 ▴ 300

Hello everyone! Let me provide with a bit of context first. I'm performing a principal component analysis in a two-group sample (groups 1 and 2). The type of data that I have are two separated vcfs, one from each group. To do the PCA in Plink, I needed to generate one single vcf file with the individuals from both groups. For merging, I used vcf-merge command from vcftools, which seemed to run correctly.

The problem: after merging both files, doing the PCA and visualizing in an R graph, I noticed the graph was odd (you can see it below), and a labmate told me "ohh, that's a classic merging error, I've seen it before.. but I don't remember much right know. see if you did something wrong in the merging". I'm new to bioinformatics, so I look again and again but I can't find the error. The commands ran smoothly in each step... and I don't have enough knowledge yet to spot the mistake. As my labmate told me that, I decided to post the question here, since it seemed like a "classic rookie mistake".

Here you have every step of the process, to see if you can spot the problem, and the final graph.

./bgzip group1.vcf 
./tabix group1.vcf.gz
 ./bgzip group2.vcf 
 ./tabix group2.vcf.gz

vcf-merge group1.vcf.gz group2.vcf.gz | bgzip –c > bothmerge.vcf.gz

./plink --vcf bothmerge.vcf.gz --pca --out bothmergepca

(Then, loading the bothmergepca.eigenvec file in R, I plotted the first principal component against the second one).

The expected graph was like a cloud of 2000 dots. Note: I have done PCA and visualized it on R before, so I'm more familiar with that and I am pretty certain that the mistake is not in those steps.

You can see the graph here: https://ibb.co/cRMS5k

Hope someone can help me, or at least hint me. Thank you for your time !

vcf vcf-merge • 1.6k views

ADD COMMENT • link 7.0 years ago by msimmer92 ▴ 300