Question: Merge multiple VCF files (same variants, same sample) into one VCF file
0
gravatar for Eleanore
12 weeks ago by
Eleanore0
Eleanore0 wrote:

Dear all,

I have a problem at hand regarding the manipulation of multiple VCF files (containing the same variants and referred to the same sample) so as to merge their INFO fields..

The context.

Say I have the following VCF file (headers not included):

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr13   32903685    .   C   T   7555.77 PASS    .   GT:AD:DP:GQ:PL  0/1:219,340:569:99:7584,0,4763

Now, I create two copies of the same VCF file, and annotate each one of them with two annotation sources. So, the first one becomes:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr13   32903685    .   C   T   7555.77 PASS    CustomOne=1 GT:AD:DP:GQ:PL  0/1:219,340:569:99:7584,0,4763

while the second one becomes:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr13   32903685    .   C   T   7555.77 PASS    CustomTwo=2 GT:AD:DP:GQ:PL  0/1:219,340:569:99:7584,0,4763

I would like now to merge the aforementioned copies, so as to obtain:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr13   32903685    .   C   T   7555.77 PASS    CustomOne=1;CustomTwo=2 GT:AD:DP:GQ:PL  0/1:219,340:569:99:7584,0,4763

Basically, the result I would like to achieve maintains the same #CHROM, POS, REF, ALT, QUAL, FILTER, FORMAT and sample columns, and merges the contents of the INFO column found in each copy.

The solution I tried.

I tried (unsuccessfully) with several options:

  • bcftool merge, but this supposes to merge different samples, while I am working with the same sample
  • bcftool concat, but this concats two VCF files
  • SnpSift annotate, but this does not accept a list of files which is greater than two, meaning that I cannot use this command if the number of copies to be merged is greater than two

My question!

Can you suggest me how to proceed?

Thank you for your help.

annotation vcf • 316 views
ADD COMMENTlink modified 4 weeks ago by thondeboer40 • written 12 weeks ago by Eleanore0
0
gravatar for trausch
12 weeks ago by
trausch760
Germany
trausch760 wrote:

Two INFO fields with the same name "Custom" are not allowed but I think, the recent bcftools versions can relabel INFO fields:

bcftools annotate -a custom1.vcf.gz -c INFO/CustomImported:=INFO/Custom custom2.vcf.gz

ADD COMMENTlink written 12 weeks ago by trausch760

Yeah, sorry, I got a wrong example. I am to re-edit the question putting two different INFO fields... So, does this command allow multiple files too?

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Eleanore0

Maybe there is a more elegant solution but pipes should work:

zcat custom1.vcf.gz | bcftools annotate -a custom2.vcf.gz -c INFO/CustomTwo - | bcftools annotate -a custom3.vcf.gz -c INFO/CustomThree -

ADD REPLYlink written 12 weeks ago by trausch760

This is a solution that I applied at first, but it does not scale since it continuously opens new annotation processes (N-1 if the copies are N), which does not scale. Isn't there a tool that does this operation for me, without launching several annotation processes?

ADD REPLYlink written 12 weeks ago by Eleanore0
0
gravatar for thondeboer
4 weeks ago by
thondeboer40
thondeboer40 wrote:

I think GATK's CombineVariants can do this...I have the same issue but have not confirmed this yet. https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php

And now GATK is going to be Open Source, you should be able to use it freely. It's in Beta5 now, and won't be officially released until Jan 9th 2018, but BETA should work...I always loved the VCF manipulation that you could do with GATK and that was in the 2.4 days, so can only imagine that it got better

ADD COMMENTlink written 4 weeks ago by thondeboer40
0
gravatar for thondeboer
4 weeks ago by
thondeboer40
thondeboer40 wrote:

Seems that CombineVariants is no longer part of GATK4 and the closest tool in GATK 4 is MergeVcfs, but that is not smart and simply creates duplicate lines and does not merge the annotations...Sorry...

ADD COMMENTlink written 4 weeks ago by thondeboer40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 960 users visited in the last hour