Question: merging vcf files with sample information
0
gravatar for devarora
7 weeks ago by
devarora120
SouthKorea
devarora120 wrote:

I am working on gwas data and ran bwa, GATK pipeline and generated vcf file of all the samples. After merging the vcf file using picard

java -jar /680_info4/project/arora/program/picard.jar MergeVcfs I=/680_info4/project/arora1/raw_variants.vcf I=/680_info4/project/arora2/raw_variants.vcf I=/680_info4/project/arora3/raw_variants.vcf I=/680_info4/project/arora4/raw_variants.vcf I=/680_info4/project/arora5/raw_variants.vcf I=/680_info4/project/arora6/raw_variants.vcf I=/680_info4/project/arora7/raw_variants.vcf O=all_raw_variant.vcf

When I ran pca plot script in plink it ran with an error saying only one sample and when I saw the same in vcf file and we have only one sample in merge file:

> #CHROM    POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  20
   >1   112 .   C   T   61.60   PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=2.243;DP=117;ExcessHet=3.0103;FS=4.760;MLEAC=1;MLEAF=0.500;MQ=47.51;MQRankSum=-0.497;QD=0.5 3;ReadPosRankSum=-0.030;SOR=1.493   GT:AD:DP:GQ:PL  0/1:106,10:116:69:69,0,3944
  >1    131 .   G   T   1457.60 PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=-0.046;DP=89;ExcessHet=3.0103;FS=2.787;MLEAC=1;MLEAF=0.500;MQ=49.38;MQRankSum=-0.647;QD=16. 38;ReadPosRankSum=-0.950;SOR=0.799  GT:AD:DP:GQ:PL  0/1:47,42:89:99:1465,0,1643

May I know what I an doing wrong? Any other way to merge vcf file or any suggestion are most welcome.

gwas vcf • 145 views
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by devarora120
0
gravatar for Brice Sarver
7 weeks ago by
Brice Sarver3.5k
United States
Brice Sarver3.5k wrote:

From the Picard documentation for MergeVcfs:

Merges multiple VCF or BCF files into one VCF file. Input files must be sorted by their contigs and, within contigs, by start position. The input files must have the same sample and contig lists. An index file is created and a sequence dictionary is required by default.

This is assuming a single sample, and it sounds like you were expecting 7 based on your input.

bcftools merge can be used for non-overlapping sample sets and is likely what you should use to create a multisample VCF.

ADD COMMENTlink written 7 weeks ago by Brice Sarver3.5k

I tried bcftools merge option and got results like:

1 31124 . C CT 893.64 . BaseQRankSum=-1.111;ExcessHet=3.0103;FS=1.672;MQ=59.45;MQRankSum=1.349;QD=2.49;ReadPosRankSum=0.618;SOR=0.578;DP=648;AF=0.5;MLEAC=1;MLEAF=0.5;AN=6;AC=3 GT:AD:DP:GQ:PL ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. 0/1:294,65:359:99:901,0,7903 ./.:.:.:.:. 0/1:90,19:109:99:250,0,2501 0/1:108,24:132:99:334,0,2948

Not able to understand why giving missing information ./.:.:.:.:. in the information

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by devarora120

The missing info means that a given variant was not in that particular VCF. In your example, 4/7 of your VCFs do not have variants called at that position, hence the missing genotype data.

If you are expecting to have reference calls included as well, which aren't in VCFs by default, look into calling gVCFs or using options like --emit-all-sites in the GATK.

ADD REPLYlink written 7 weeks ago by Brice Sarver3.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1711 users visited in the last hour