Question: Can't find the "classic" error in merged vcf files with vcf-merge.
gravatar for melnuesch
8 months ago by
melnuesch10 wrote:

Hello everyone! Let me provide with a bit of context first. I'm performing a principal component analysis in a two-group sample (groups 1 and 2). The type of data that I have are two separated vcfs, one from each group. To do the PCA in Plink, I needed to generate one single vcf file with the individuals from both groups. For merging, I used vcf-merge command from vcftools, which seemed to run correctly.

The problem: after merging both files, doing the PCA and visualizing in an R graph, I noticed the graph was odd (you can see it below), and a labmate told me "ohh, that's a classic merging error, I've seen it before.. but I don't remember much right know. see if you did something wrong in the merging". I'm new to bioinformatics, so I look again and again but I can't find the error. The commands ran smoothly in each step... and I don't have enough knowledge yet to spot the mistake. As my labmate told me that, I decided to post the question here, since it seemed like a "classic rookie mistake".

Here you have every step of the process, to see if you can spot the problem, and the final graph.

./bgzip group1.vcf 
./tabix group1.vcf.gz
 ./bgzip group2.vcf 
 ./tabix group2.vcf.gz

vcf-merge group1.vcf.gz group2.vcf.gz | bgzip –c > bothmerge.vcf.gz

./plink --vcf bothmerge.vcf.gz --pca --out bothmergepca

(Then, loading the bothmergepca.eigenvec file in R, I plotted the first principal component against the second one).

The expected graph was like a cloud of 2000 dots. Note: I have done PCA and visualized it on R before, so I'm more familiar with that and I am pretty certain that the mistake is not in those steps.

You can see the graph here:

Hope someone can help me, or at least hint me. Thank you for your time !

vcf-merge vcf • 284 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by melnuesch10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1426 users visited in the last hour