Question: merging multiple vcfs by column in Picard
0
gravatar for nitinra
4 weeks ago by
nitinra0
nitinra0 wrote:

Hello everyone,

I used picard MergeVcfs to combine individual vcf files with the following command:

java -jar /picard.jar MergeVcfs -I input.vcf -I input2.vcf -I inputs.vcf  O= output.vcf.gz

The resulting output file looks like this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  20
Chromosome_1    517     .       CTT     C       79.60   .       AC=1;AF=0.500;AN=2;BaseQRankSum=-0.080;DP=16;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=57.36;MQRankSum=-3.750;QD=4.98;ReadPosRankSum=0.068;SOR=0.446     GT:AD:DP:GQ:PL  0/1:13,3:16:87:87,0,537
Chromosome_1    562     .       CATTTCTCTA      C       46.60   .       AC=1;AF=0.500;AN=2;BaseQRankSum=1.180;DP=14;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=53.19;MQRankSum=0.558;QD=3.88;ReadPosRankSum=0.493;SOR=1.179       GT:AD:DP:GQ:PL  0/1:10,2:12:54:54,0,404

The resulting file is 20.1gb and samples seemed to be combined by rows and not columns. How do I change it so that I can reduce the file size of my vcf?

Thank you!

snp picard genome vcf • 110 views
ADD COMMENTlink modified 4 weeks ago by Pierre Lindenbaum131k • written 4 weeks ago by nitinra0
3
gravatar for Pierre Lindenbaum
4 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

The resulting file is 20.1gb and samples seemed to be combined by rows and not columns.

I suppose all your VCF have the same sample named 20. They should all have some different names.

use bcftools reheader to rename the samples or GATK3.8 CombineVariants with --genotypeMergeOptions UNIQUIFY

ADD COMMENTlink written 4 weeks ago by Pierre Lindenbaum131k
3
gravatar for RamRS
4 weeks ago by
RamRS30k
Baylor College of Medicine, Houston, TX
RamRS30k wrote:

"Merge" can mean (at least) two operations with VCF data. Using bcftools notations, you can either concat VCF data that describe variants occurring in different regions of the same sample, or you can merge data that describe variants occurring in the same region in diiferent samples. I think your input data determines what Picard's MergeVcfs does, and in your case, the sample names could be the same, resulting in the tool working in a concat fashion.

ADD COMMENTlink written 4 weeks ago by RamRS30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1775 users visited in the last hour