Vcf Tools Error
0
0
Entering edit mode
8.3 years ago

I'm working on a project that involves merging four VCF files. The first three datasets ran smoothly through vcftools' vcf-merge, but merging the final dataset produces the following error:

Uh: 4 vs 1

at Vcf.pm line 169

Vcf::throw('Vcf4_0=HASH(0xa2d9870)', 'Uh: 4 vs 1\x{a}') called at Vcf.pm line 1492

VcfReader::format_haplotype('Vcf4_0=HASH(0xa2d9870)', 'ARRAY(0xa3e65c0)', 'ARRAY(0xa412cb0)') called at /path/to/vcftools_0.1.6/perl/vcf-merge line 426

main::merge_vcf_files('HASH(0x9f54170)') called at /path/to/vcftools_0.1.6/perl/vcf-merge line 12

All files have been compressed with bgzip, indexed with tabix, and should be in proper VCF4 format.

vcftools tabix • 2.4k views
ADD COMMENT
0
Entering edit mode

Where are your VCFs coming from? can you post a few lines of your VCF? My guess is that your VCFs are wonky, no fault of yours. I can probably hack VCF.pm to fix your problem.

ADD REPLY
0
Entering edit mode

The first three VCF files (the ones that successfully merged together) are from the uk10k project; the last one is from 1,000 genomes.

A few lines from the uk10k datasets (only the first few patients are shown):

1 14976 rs71252251 G A,C 999 MinVQSLOD BaseQRankSum=-8.331;CSQ=ENST00000423562:WASH7P:WITHIN_NON_CODING_GENE+ENST00000430492:WASH7P:DOWNSTREAM+ENST00000438504:WASH7P:WITHIN_NON_CODING_GENE+ENST00000450305:DDX11L1:DOWNSTREAM+ENST00000456328:DDX11L1:DOWNSTREAM+ENST00000488147:WASH7P:WITHIN_NON_CODING_GENE,INTRONIC+ENST00000515242:DDX11L1:DOWNSTREAM+ENST00000518655:DDX11L1:DOWNSTREAM+ENST00000537342:WASH7P:DOWNSTREAM+ENST00000538476:WASH7P:WITHIN_NON_CODING_GENE,INTRONIC+ENST00000541675:WASH7P:WITHIN_NON_CODING_GENE;DP4=292802,217007,21432,16455;DP=558725;Dels=0.00;FS=633.847;HaplotypeScore=2.5910;InbreedingCoeff=-0.2474;MDV=100;MQ0=183453;MQ=19;MQRankSum=87.614;MSD=375;MSQ=99;PV0=0.001;PV1=7.6e-06;PV2=1;PV3=1;PV4=0.001,7.6e-06,1,1;QD=1.32;ReadPosRankSum=20.051;SB=0.4833;VDB=0.0434;VQSLOD=-26.1505;culprit=MQ;AN=662;AC=9,0 GT:DP:DV:GQ:PL:QP:SP 0/0:179:20:99:0,236,213,255,255,255:50:3 0/0:73:0:99:0,220,180,220,180,180:40:0 0/0:150:17:99:0,168,160,255,208,229:49:2

1 15118 rs71252250 A G 999 MinVQSLOD BaseQRankSum=119.682;CSQ=ENST00000423562:WASH7P:WITHIN_NON_CODING_GENE,INTRONIC+ENST00000430492:WASH7P:DOWNSTREAM+ENST00000438504:WASH7P:WITHIN_NON_CODING_GENE,INTRONIC+ENST00000450305:DDX11L1:DOWNSTREAM+ENST00000456328:DDX11L1:DOWNSTREAM+ENST00000488147:WASH7P:WITHIN_NON_CODING_GENE,INTRONIC+ENST00000515242:DDX11L1:DOWNSTREAM+ENST00000518655:DDX11L1:DOWNSTREAM+ENST00000537342:WASH7P:DOWNSTREAM+ENST00000538476:WASH7P:WITHIN_NON_CODING_GENE,INTRONIC+ENST00000541675:WASH7P:WITHIN_NON_CODING_GENE,INTRONIC;DP4=6455,94784,3011,38261;DP=149166;Dels=0.00;FS=0.000;HRun=0;HWE=0.000000;HaplotypeScore=0.4952;ICF=-0.20697;InbreedingCoeff=-0.4571;MDV=77;MQ0=96363;MQ=11;MQRankSum=14.607;MSD=148;MSQ=68;PV0=3.8e-10;PV1=1;PV2=1;PV3=1;PV4=3.8e-10,1,1,1;QD=1.58;ReadPosRankSum=-2.311;SB=0.0760;VDB=0.0421;VQSLOD=-39.5528;culprit=MQ;AN=306;AC=48 GT:DP:DV:GQ:PL:QP:SP 0/0:62:12:35:0,35,56:45:0 0/0:22:1:43:0,63,37:25:0

1 120994 . AAT A 999 EndDistBias CSQ=ENST00000466430:ENSG00000238009:UPSTREAM+ENST00000471248:ENSG00000238009:WITHI N_NON_CODING_GENE,INTRONIC+ENST00000477740:ENSG00000238009:WITHIN_NON_CODING_GENE,INTRONIC;DP4=5,34528,0,679;DP=59595;HWE=0.000029;ICF=0.10466;IND EL;IS=12,0.123711;MDV=15;MQ=32;MSD=78;MSQ=99;PV0=1;PV1=1;PV2=2.3e-70;PV3=1.9e-72;PV4=1,1,2.3e-70,1.9e-72;QD=0.0823;SB=0.0015;VDB=0.0233;AN=118;AC= 1 GT:DP:DV:GQ:PL:QP:SP ./.:7:0:35:0,21,53:28:0 0/0:30:0:99:0,90,165:40:0 ./.:0:0:0:0,0,0:.:0 ./.:5:0:29:0,15,12:13:0 0/0:18:0:68:0,54 ,158:39:0 ./.:4:0:26:0,12,66:30:0 ./.:6:0:32:0,18,72:31:0 0/0:21:0:77:0,63,160:39:0 0/0:16:0:62:0,48,154:39:0 ./.:2:0:20:0,6,45: 27:0 ./.:1:0:17:0,3,23:20:0 ./.:14:0:56:0,42,161:39:0 ./.:6:0:32:0,18,56:29:0 ./.:2:0:20:0,6,7:8:0 ./.:2:2:14:49,6,0:27:0 ./.:0:0:0:0, 0,0:.:0 ./.:3:0:23:0,9,59:29:0 0/0:22:0:80:0,66,156:39:0 ./.:3:0:23:0,9,43:26:0

A few lines from the 1,000 geones dataset:

1 10583 rs58108140 G A 100 PASS AVGPOST=0.7707;RSQ=0.4319;LDAF=0.2327;ERATE=0.0161;AN=2184;VT=SNP;AA=.;THETA=0.0046;AC=314;SNPSOURCE=LOWCOV;AF=0.14;ASN_AF=0.13;AMR_AF=0.17;AFR_AF=0.04;EUR_AF=0.21 GT:DS:GL 0|0:0.200:-0.18,-0.47,-2.42 0|0:0.150:-0.24,-0.44,-1.16 0|0:0.150:-0.15,-0.54,-3.12 0|1:0.600:-0.48,-0.48,-0.48

1 13957 rs201747181 TC T 28 PASS AA=TC;AC=35;AF=0.02;AFR_AF=0.02;AMR_AF=0.02;AN=2184;ASN_AF=0.01;AVGPOST=0.8711;ERATE=0.0065;EUR_AF=0.02;LDAF=0.0788;RSQ=0.2501;THETA=0.0100;VT=INDEL GT:DS:GL 0|0:0.050:0,0,0 0|1:0.650:0,0,0 0|0:0.100:0,0,0 0|0:0.350:0,0,0 0|0:0.050:0.00,-0.30,-4.10 0|0:0.650:0,0,0 0|0:0.150:0,0,0 0|0:0.150:0.00,-0.30,-4.10

1 55249 rs200769871 C CTATGG 443 PASS AA=C;AC=151;AF=0.07;AFR_AF=0.03;AMR_AF=0.08;AN=2184;ASN_AF=0.16;AVGPOST=0.9073;ERATE=0.0063;EUR_AF=0.02;LDAF=0.0968;RSQ=0.5891;THETA=0.0038;VT=INDEL GT:DS:GL 0|0:0.000:0.00,-0.30,-6.70 0|0:0.000:0,0,0 0|0:0.200:0,0,0 0|0:0.050:0,0,0 0|1:0.800:0,0,0 0|0:0.000:0.00,-0.30,-6.70

ADD REPLY
0
Entering edit mode

So after looking at the VCF tools subroutine _format_line_hash I cannot tell you what is wrong. The code spans several pages and is not documented. Grrr - bad coding.

ADD REPLY
0
Entering edit mode

I suggest you ask this question on the vcftools help mailing list, as it is more likely to be seen by the developer of the perl module (vcftools-help (at) lists.sourceforge.net).

ADD REPLY

Login before adding your answer.

Traffic: 2031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6