Question: merging vcf files with sample information
gravatar for devarora
8 months ago by
devarora240 wrote:

I am working on gwas data and ran bwa, GATK pipeline and generated vcf file of all the samples. After merging the vcf file using picard

java -jar /680_info4/project/arora/program/picard.jar MergeVcfs I=/680_info4/project/arora1/raw_variants.vcf I=/680_info4/project/arora2/raw_variants.vcf I=/680_info4/project/arora3/raw_variants.vcf I=/680_info4/project/arora4/raw_variants.vcf I=/680_info4/project/arora5/raw_variants.vcf I=/680_info4/project/arora6/raw_variants.vcf I=/680_info4/project/arora7/raw_variants.vcf O=all_raw_variant.vcf

When I ran pca plot script in plink it ran with an error saying only one sample and when I saw the same in vcf file and we have only one sample in merge file:

   >1   112 .   C   T   61.60   PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=2.243;DP=117;ExcessHet=3.0103;FS=4.760;MLEAC=1;MLEAF=0.500;MQ=47.51;MQRankSum=-0.497;QD=0.5 3;ReadPosRankSum=-0.030;SOR=1.493   GT:AD:DP:GQ:PL  0/1:106,10:116:69:69,0,3944
  >1    131 .   G   T   1457.60 PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=-0.046;DP=89;ExcessHet=3.0103;FS=2.787;MLEAC=1;MLEAF=0.500;MQ=49.38;MQRankSum=-0.647;QD=16. 38;ReadPosRankSum=-0.950;SOR=0.799  GT:AD:DP:GQ:PL  0/1:47,42:89:99:1465,0,1643

May I know what I an doing wrong? Any other way to merge vcf file or any suggestion are most welcome.

gwas vcf • 338 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by devarora240
gravatar for Brice Sarver
8 months ago by
Brice Sarver3.5k
United States
Brice Sarver3.5k wrote:

From the Picard documentation for MergeVcfs:

Merges multiple VCF or BCF files into one VCF file. Input files must be sorted by their contigs and, within contigs, by start position. The input files must have the same sample and contig lists. An index file is created and a sequence dictionary is required by default.

This is assuming a single sample, and it sounds like you were expecting 7 based on your input.

bcftools merge can be used for non-overlapping sample sets and is likely what you should use to create a multisample VCF.

ADD COMMENTlink written 8 months ago by Brice Sarver3.5k

I tried bcftools merge option and got results like:

1 31124 . C CT 893.64 . BaseQRankSum=-1.111;ExcessHet=3.0103;FS=1.672;MQ=59.45;MQRankSum=1.349;QD=2.49;ReadPosRankSum=0.618;SOR=0.578;DP=648;AF=0.5;MLEAC=1;MLEAF=0.5;AN=6;AC=3 GT:AD:DP:GQ:PL ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. 0/1:294,65:359:99:901,0,7903 ./.:.:.:.:. 0/1:90,19:109:99:250,0,2501 0/1:108,24:132:99:334,0,2948

Not able to understand why giving missing information ./.:.:.:.:. in the information

ADD REPLYlink modified 8 months ago • written 8 months ago by devarora240

The missing info means that a given variant was not in that particular VCF. In your example, 4/7 of your VCFs do not have variants called at that position, hence the missing genotype data.

If you are expecting to have reference calls included as well, which aren't in VCFs by default, look into calling gVCFs or using options like --emit-all-sites in the GATK.

ADD REPLYlink written 8 months ago by Brice Sarver3.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1370 users visited in the last hour