Question: concatenate VCFs with same sample set but with varying order
0
gravatar for mernster
15 months ago by
mernster0
mernster0 wrote:

Hello,

for each chromosome, I have a multisample VCF containing the same set of samples. However, the order of the sample columns varies among the VCFs.

For example:

VCF file 1:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  14  11  13  12  3   6   10  7   9   5   8   2   4   1

VCF file 2:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  5   9   11  13  2   7   12  6   10  14  4   3   8   1

I'd like to merge them so that the samples with the same ID are concatenated. However, since the sample order differs, I can't use bcftools concat...

Any suggestions of how I could either

  1. change the order of the samples in every multisample-vcf so that I can use bcftools
  2. use a different tool to concatenate the VCF files

Cheers

concatenate bcftools merge vcf • 872 views
ADD COMMENTlink modified 15 months ago by Pierre Lindenbaum133k • written 15 months ago by mernster0
0
gravatar for Pierre Lindenbaum
15 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

use a different tool to concatenate the VCF files

picard MergeVcfs https://broadinstitute.github.io/picard/command-line-overview.html#MergeVcfs

or

picard GatherVcfs (faster if you don't have any overlaping regions) https://broadinstitute.github.io/picard/command-line-overview.html#GatherVcfs

ADD COMMENTlink written 15 months ago by Pierre Lindenbaum133k

Hi Pierre,

thanks for you answer!

I tried picard GatherVcfs

java -jar $PICARD_HOME/picard.jar GatherVcfs I=35.vcf I=36.vcf O=all.vcf

The VCFs have the same sample IDs but in different order:

bcftools query -l 35.vcf

14 11 13 12 3 6 10 7 9 5 8 2 4 1

bcftools query -l 36.vcf

5 9 11 13 2 7 12 6 10 14 4 3 8 1

I get an error message from plink saying that the sample IDs don't match. However it can't find any unique names, which makes me think that the problem is, again, the order..

There was a problem with gathering the INPUT.java.lang.IllegalArgumentException: VCFs do not have identical sample lists. Samples unique to first file: []. Samples unique to 36.vcf: [].

ADD REPLYlink modified 15 months ago • written 15 months ago by mernster0

Okay, I have tried

java -jar $PICARD_HOME/picard.jar MergeVcfs I=35.vcf I=36.vcf O=all.vcf

instead of

java -jar $PICARD_HOME/picard.jar GatherVcfs I=35.vcf I=36.vcf O=all.vcf

and it seems to work!! thank you very much

ADD REPLYlink written 15 months ago by mernster0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1157 users visited in the last hour