bcftools merge fails: Only fixed-length vectors are supported
1
0
Entering edit mode
4.0 years ago

Dear community members,

I face a problem - I need to create a multi-sample VCF from thousands of VCF files, the problem is - they are created with FreeBayes and somehow the techniques I always use do not work. E.g. when I try to use

bcftools merge

it tells me:

Only fixed-length vectors are supported with -i sum:DP

I am not that proficient with VCF format and this error message is totally cryptic for me - do you have an idea why it may happen? An example line from my VCF files looks like:

chr1    911595  .       A       G       2911    .       MQM=60  GT:DP:AO        1/1:93:93

I had an idea that it may be caused by multi-allelic sites - but bcftools in theory should be able to deal with them...

Any advice on how to create a multi-sample VCF is appreciated! (I used bcftools merge several times with GATK output and it worked, but now I am stuck...)

Googling did not help.

Command line used:

/mnt/share/opt/bcftools-1.9/bcftools merge sample1.vcf.gz  sample2.vcf.gz --merge none > merged.cases.vcf
bcftools mutli-sample VCF • 1.2k views
ADD COMMENT
2
Entering edit mode
4.0 years ago
Carambakaracho ★ 3.2k

As you don't have DP values in the info column, switch of default behaviour to sum up the DP values in the infocolumn

bcftools merge -i -

This is untested but that's how I understand the help:

bcftools merge --help
   -i, --info-rules <tag:method,..>   rules for merging INFO fields (method is one of sum,avg,min,max,join) or "-" to turn off the default [DP:sum,DP4:sum]
ADD COMMENT
0
Entering edit mode

Thanks I will try! I honestly checked the manual but somehow my logic was not efficient enough to find this...

ADD REPLY
0
Entering edit mode

well, now it complains about the header:

Could not parse the header line: "##SAMPLE=<ID>,Gender=F,IsTumor=No...etc etc"

but this is another question - for that one the answer worked!

ADD REPLY
0
Entering edit mode

Looks like it has dash characters and bcftools does not like them. Will try vcftools vcf-merge instead - it just does not worth it to re-write all the VCFs because of this...

ADD REPLY
0
Entering edit mode

Sorry,

just had to add something to this info if smb else will face the same problem

bcftools complains, but does the job - wow, I am impressed, so I have a multi-sample VCF despite multiple error messages

ADD REPLY
1
Entering edit mode

well, in case the Sample line is in the vcf just like you pasted above, the line is invalid. It should be something like

#SAMPLE=<ID=Patient_XYZ,Gender=F,IsTumor=No>

See vcf specs (v.4.3) section 1.4.8 Sample Field Format. I'd recommend to take a closer look at the merged VCF, just to make sure, you'll be able to trace the individual samples back after merging.

ADD REPLY
0
Entering edit mode

Thanks a lot! Will do! Somehow we still follow 4.2 - but I guess the difference is not big. We use our own processing system (we don't even really use VCFs) which is fine for clinics, but for research it is such a pain...

ADD REPLY
1
Entering edit mode

no, the differences between 4.2 and 4.3 are mostly semantics in the specs, they're way more explicit.

We use our own processing system (we don't even really use VCFs) which is fine for clinics, but for research it is such a pain...

Oh, I know that feeling... :-D

ADD REPLY

Login before adding your answer.

Traffic: 2669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6