Question: merging multiple vcfs by column in Picard
0
gravatar for nitinra
4 months ago by
nitinra10
nitinra10 wrote:

Hello everyone,

I used picard MergeVcfs to combine individual vcf files with the following command:

java -jar /picard.jar MergeVcfs -I input.vcf -I input2.vcf -I inputs.vcf  O= output.vcf.gz

The resulting output file looks like this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  20
Chromosome_1    517     .       CTT     C       79.60   .       AC=1;AF=0.500;AN=2;BaseQRankSum=-0.080;DP=16;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=57.36;MQRankSum=-3.750;QD=4.98;ReadPosRankSum=0.068;SOR=0.446     GT:AD:DP:GQ:PL  0/1:13,3:16:87:87,0,537
Chromosome_1    562     .       CATTTCTCTA      C       46.60   .       AC=1;AF=0.500;AN=2;BaseQRankSum=1.180;DP=14;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=53.19;MQRankSum=0.558;QD=3.88;ReadPosRankSum=0.493;SOR=1.179       GT:AD:DP:GQ:PL  0/1:10,2:12:54:54,0,404

The resulting file is 20.1gb and samples seemed to be combined by rows and not columns. How do I change it so that I can reduce the file size of my vcf?

Thank you!

snp picard genome vcf • 240 views
ADD COMMENTlink modified 4 months ago by Pierre Lindenbaum133k • written 4 months ago by nitinra10
3
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

The resulting file is 20.1gb and samples seemed to be combined by rows and not columns.

I suppose all your VCF have the same sample named 20. They should all have some different names.

use bcftools reheader to rename the samples or GATK3.8 CombineVariants with --genotypeMergeOptions UNIQUIFY

ADD COMMENTlink written 4 months ago by Pierre Lindenbaum133k
3
gravatar for _r_am
4 months ago by
_r_am32k
Baylor College of Medicine, Houston, TX
_r_am32k wrote:

"Merge" can mean (at least) two operations with VCF data. Using bcftools notations, you can either concat VCF data that describe variants occurring in different regions of the same sample, or you can merge data that describe variants occurring in the same region in diiferent samples. I think your input data determines what Picard's MergeVcfs does, and in your case, the sample names could be the same, resulting in the tool working in a concat fashion.

ADD COMMENTlink written 4 months ago by _r_am32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1108 users visited in the last hour
_