Question: vcf-concat error "column names do not match"
0
gravatar for Scott
4.4 years ago by
Scott80
Canada
Scott80 wrote:

I am new to VCF tools and having trouble combining VCF files from different sub-populations.

I know it is possible to download such data already combined from 1000 Genomes' data slicer tool, but it is not able to handle a larger number of populations in one file as I sometimes require.

I am using the vcftools vcf-concat function to achieve this, but I am getting the error message below.

I am running OSX and using VCF tools through terminal.

Code:

-------------------------

./vcf-concat CEU_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz FIN_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz > test_out.vcf 

The column names do not match; the column "NA06984" no present in [FIN_filtered.ALL.chr10.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz].

 at ./vcf-concat line 32, <__ANONIO__> line 251.

    main::error('The column names do not match; the column "NA06984" no presen...') called at ./vcf-concat line 170

    main::concat('HASH(0x7fd6da0050c8)') called at ./vcf-concat line 12

---------------------------

Both of the files have one site and 99 individuals. 

Thank you!

snp vcftools vcf-concat • 1.9k views
ADD COMMENTlink modified 4.4 years ago by Devon Ryan88k • written 4.4 years ago by Scott80
2
gravatar for Devon Ryan
4.4 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

You want to merge, not concatenate, them. So use vcf-merge instead.

ADD COMMENTlink written 4.4 years ago by Devon Ryan88k

Are you sure? Each file has genotypes for the same single marker. I want a "super-population" of the two populations combined, but still only the single marker. 

ADD REPLYlink written 4.4 years ago by Scott80

Yes, the last thing you would ever want to do would be to concatenate datasets like that...it'd produce completely useless results. BTW, I suspect part of your confusion arises from misunderstanding the word "concatenate". If you had two files like:

file1

position1    pop1_sample1 pop1_sample2 pop1_sample3

and file2:

position1    pop2_sample1 pop2_sample2 pop2_sample3

and concatenated them then You'd duplicate each shared position:

position1    pop1_sample1 pop1_sample2 pop1_sample3
position1    pop2_sample1 pop2_sample2 pop2_sample3

The resulting file isn't a valid VCF. What you want is to add the individual sample calls as new columns, which is what merging does.

ADD REPLYlink written 4.4 years ago by Devon Ryan88k

Hi Devon. Thanks for the explanation. I interpreted their merge and concatenate as the complete opposites. Thanks for the clarification. This might have been because I have been using the .ped format a lot, which has individual IDs in the first column and markers/ positions as adjacent columns. Thanks again!

ADD REPLYlink written 4.4 years ago by Scott80

Ah, that'd certainly cause the confusion!

ADD REPLYlink written 4.4 years ago by Devon Ryan88k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1381 users visited in the last hour