Can't find the "classic" error in merged vcf files with vcf-merge.
0
0
Entering edit mode
7.0 years ago
msimmer92 ▴ 300

Hello everyone! Let me provide with a bit of context first. I'm performing a principal component analysis in a two-group sample (groups 1 and 2). The type of data that I have are two separated vcfs, one from each group. To do the PCA in Plink, I needed to generate one single vcf file with the individuals from both groups. For merging, I used vcf-merge command from vcftools, which seemed to run correctly.

The problem: after merging both files, doing the PCA and visualizing in an R graph, I noticed the graph was odd (you can see it below), and a labmate told me "ohh, that's a classic merging error, I've seen it before.. but I don't remember much right know. see if you did something wrong in the merging". I'm new to bioinformatics, so I look again and again but I can't find the error. The commands ran smoothly in each step... and I don't have enough knowledge yet to spot the mistake. As my labmate told me that, I decided to post the question here, since it seemed like a "classic rookie mistake".

Here you have every step of the process, to see if you can spot the problem, and the final graph.

./bgzip group1.vcf 
./tabix group1.vcf.gz
 ./bgzip group2.vcf 
 ./tabix group2.vcf.gz

vcf-merge group1.vcf.gz group2.vcf.gz | bgzip –c > bothmerge.vcf.gz

./plink --vcf bothmerge.vcf.gz --pca --out bothmergepca

(Then, loading the bothmergepca.eigenvec file in R, I plotted the first principal component against the second one).

The expected graph was like a cloud of 2000 dots. Note: I have done PCA and visualized it on R before, so I'm more familiar with that and I am pretty certain that the mistake is not in those steps.

You can see the graph here: https://ibb.co/cRMS5k

Hope someone can help me, or at least hint me. Thank you for your time !

vcf vcf-merge • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 2739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6