merging vcf files with sample information
1
0
Entering edit mode
4.2 years ago
tothepoint ▴ 800

I am working on gwas data and ran bwa, GATK pipeline and generated vcf file of all the samples. After merging the vcf file using picard

java -jar /680_info4/project/arora/program/picard.jar MergeVcfs I=/680_info4/project/arora1/raw_variants.vcf I=/680_info4/project/arora2/raw_variants.vcf I=/680_info4/project/arora3/raw_variants.vcf I=/680_info4/project/arora4/raw_variants.vcf I=/680_info4/project/arora5/raw_variants.vcf I=/680_info4/project/arora6/raw_variants.vcf I=/680_info4/project/arora7/raw_variants.vcf O=all_raw_variant.vcf

When I ran pca plot script in plink it ran with an error saying only one sample and when I saw the same in vcf file and we have only one sample in merge file:

> #CHROM    POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  20
   >1   112 .   C   T   61.60   PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=2.243;DP=117;ExcessHet=3.0103;FS=4.760;MLEAC=1;MLEAF=0.500;MQ=47.51;MQRankSum=-0.497;QD=0.5 3;ReadPosRankSum=-0.030;SOR=1.493   GT:AD:DP:GQ:PL  0/1:106,10:116:69:69,0,3944
  >1    131 .   G   T   1457.60 PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=-0.046;DP=89;ExcessHet=3.0103;FS=2.787;MLEAC=1;MLEAF=0.500;MQ=49.38;MQRankSum=-0.647;QD=16. 38;ReadPosRankSum=-0.950;SOR=0.799  GT:AD:DP:GQ:PL  0/1:47,42:89:99:1465,0,1643

May I know what I an doing wrong? Any other way to merge vcf file or any suggestion are most welcome.

vcf gwas • 1.7k views
ADD COMMENT
0
Entering edit mode
4.2 years ago
Brice Sarver ★ 3.8k

From the Picard documentation for MergeVcfs:

Merges multiple VCF or BCF files into one VCF file. Input files must be sorted by their contigs and, within contigs, by start position. The input files must have the same sample and contig lists. An index file is created and a sequence dictionary is required by default.

This is assuming a single sample, and it sounds like you were expecting 7 based on your input.

bcftools merge can be used for non-overlapping sample sets and is likely what you should use to create a multisample VCF.

ADD COMMENT
0
Entering edit mode

I tried bcftools merge option and got results like:

1 31124 . C CT 893.64 . BaseQRankSum=-1.111;ExcessHet=3.0103;FS=1.672;MQ=59.45;MQRankSum=1.349;QD=2.49;ReadPosRankSum=0.618;SOR=0.578;DP=648;AF=0.5;MLEAC=1;MLEAF=0.5;AN=6;AC=3 GT:AD:DP:GQ:PL ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. 0/1:294,65:359:99:901,0,7903 ./.:.:.:.:. 0/1:90,19:109:99:250,0,2501 0/1:108,24:132:99:334,0,2948

Not able to understand why giving missing information ./.:.:.:.:. in the information

ADD REPLY
0
Entering edit mode

The missing info means that a given variant was not in that particular VCF. In your example, 4/7 of your VCFs do not have variants called at that position, hence the missing genotype data.

If you are expecting to have reference calls included as well, which aren't in VCFs by default, look into calling gVCFs or using options like --emit-all-sites in the GATK.

ADD REPLY

Login before adding your answer.

Traffic: 1917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6