Question: bcftools merge deleting all genotypes from second file
0
gravatar for karl.fetter
2.7 years ago by
karl.fetter0 wrote:

Hello All,

I'm trying to merge 2 vcf.gz files and I'm running into a strange behavior using bcftools merge. All of the positions in my second file are being setting to missing (./.) during the merge. Does anyone have any tips for how I might fix this problem?

Here is my command:

bcftools merge -O v -m file1.vcf.gz file2.vcf.gz > out.vcf

Thanks for your help!

bcftools merge • 927 views
ADD COMMENTlink written 2.7 years ago by karl.fetter0

All of the positions in my second file are being setting to missing (./.) during the merge

ALL ? can you confirm this ? is there any position that shouldn't be set to './.' (unknown ) ? see also : https://github.com/samtools/bcftools/issues/402

ADD REPLYlink written 2.7 years ago by Pierre Lindenbaum131k

ALL ? can you confirm this ?

Thanks for the suggestion, I can confirm this. I looked closer at the file and while both input files are 85K loci large, the output is 171K. What is happening is that the two files are being concatenated and sites in file2 are set to unknown in the top half, and sites in file1 are set to unknown in the bottom half.

I thought perhaps the problem is the ID column in file2 is set to '.' for every position, and in file1 the ID column is complete and reads CHR_POS. I add the ID field to file2 to see if that was the problem. Unfortunately that did not fix it. So I'm back to square 1. Do you know which field bcftools uses to merge? My files are not in the same order. Perhaps that's the problem?

thx.

ADD REPLYlink written 2.7 years ago by karl.fetter0

My files are not in the same order. Perhaps that's the problem?

It's always better to work with sorted files. So you should give it a try. The other thing I see is, you are using -m in your command. But there is an argument missing for it, isn't it?

-m none   ..  no new multiallelics, output multiple records instead
-m snps   ..  allow multiallelic SNP records
-m indels ..  allow multiallelic indel records
-m both   ..  both SNP and indel records can be multiallelic
-m all    ..  SNP records can be merged with indel records
-m id     ..  merge by ID

fin swimmer

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by finswimmer14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1334 users visited in the last hour