Entering edit mode
2.6 years ago
rj.rezwan
▴
20
Hi, I combined the 64 .vcf files using the CombineGVCFs in gatk. The command was completed successfully but its showing the output with only one column and why not rest of the 64 samples? The output is here
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1
chr01 9883 . A C 1105.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.00;DP=2201;ExcessHet=3.0103;FS=1.813;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=0.00;QD=33.50;ReadPosRankSum=-3.460e-01;SOR=1.302 GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:6,27:33:99:0|1:9847_G_A:1113,0,1180:9847
chr01 9903 . C T 567.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.910e-01;DP=2438;ExcessHet=3.0103;FS=10.281;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=16.22;ReadPosRankSum=0.047;SOR=0.544 GT:AD:DP:GQ:PL 0/1:19,16:35:99:575,0,722
chr01 10056 . C T 1646.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-5.220e-01;DP=2908;ExcessHet=3.0103;FS=2.244;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=26.56;ReadPosRankSum=-1.900e-01;SOR=0.890 GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:21,41:62:99:0|1:10056_C_T:1654,0,755:10056
chr01 10114 . A G 1185.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.00;DP=3067;ExcessHet=3.0103;FS=4.175;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=22.80;ReadPosRankSum=0.207;SOR=1.496 GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:22,30:52:99:0|1:10113_C_T:1193,0,1095:10113
chr01 10115 . T TGC 138.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.635;DP=3045;ExcessHet=3.0103;FS=6.896;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.00;QD=2.95;ReadPosRankSum=-1.438e+00;SOR=1.141 GT:AD:DP:GQ:PL 0/1:41,6:47:99:146,0,1628
chr01 10177 . G A 328.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.595;DP=3217;ExcessHet=3.0103;FS=1.707;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=0.00;QD=14.94;ReadPosRankSum=0.057;SOR=1.179 GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:14,8:22:99:0|1:10162_G_A:336,0,561:10162
chr01 10181 . C T 1144.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=-2.800e-02;DP=3238;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.98;MQRankSum=0.00;QD=31.79;ReadPosRankSum=0.134;SOR=0.997 GT:AD:DP:GQ:PL 0/1:8,28:36:99:1152,0,252
what was the syntax of the
CombineGVCFscommand you ran?here is the command
here is the name of a few vcf files belongs to the
vcfs.listAre all those files GVCFs? Can you show the output to:
showing this output
Now you know what's happening. All those VCF files have the same sample name so CombineGVCFs merely overwrites them. Use
bcftools reheader -sto rename the sample in each VCF file and then run CombineGVCFsOne more help, I want the header name same as the file name, e.g., the name of multiple file name is following:
So what should be the
bcftools reheader -scommand?You should sanitize the names -
[is not a good choice in a file name. Why not use..._Guanhuabai_...instead of..._[Guanhuabai]_...? Also, using the full file name is probably not a good choice as sample names should be as short as possible. I'd recommend a compromise: use the part until before the_[.As for the exact command, I think it'd be a good learning exercise for you to figure it out. First, create a file with the old and new names as specified in the manual - do this by hand if required, but a
sed(or even acut) should help you automate it. Once the file is ready, usebcftools reheader -s your_sample_name_mapping.file input.vcf | bcftools view -h | grep "^#CHROM"to see if it worked. Keep tweakingyour_sample_name_mapping.fileuntil it works and once it does, you should be able to usereheader's-ooption to output to a file.Another tip would be to rename the current files (from
.vcfto say,.old.vcf) and output the new files to existing file names so you don't have to changevcfs.list.