vcf-subset error: "Wrong number of fields" in vcf file
2
0
Entering edit mode
7.3 years ago
Lylthera • 0

Hi,

I'm having a problem while subsetting a vcf-file (from here: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ ) using the "VCFtools" function vcf-subset.

I downloaded the file for chromosome 11 from the link above and wanted to extract 11 samples using:

vcf-subset -c HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,HG00103,HG00105,HG00106,HG00107,HG00108 -e chr11.vcf.gz > chr11_subset.vcf

I've used the exact same command on many other chromosomes before, from 22-12 it worked perfectly fine. I extracted also the same samples.

What I now get is an error with the message

Wrong number of fields in vcf_files/chr11.vcf.gz; expected 2513, got 1529. The offending line was:
[11    107608645    rs556912820    ATTTG    A    100    PASS    AC=9;AF=0.00179712;AN=5008;NS=2504;DP=20206;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.0092;VT=INDEL  (then follows a list of genotypes, here all 0|0 )

Does anybody know how to deal with this? I already googled the error but I couldn't find a related problem that was actually solved.

Thanks in advance for any advice!

vcf vcftools variant 1000genomes vcf-subset • 4.0k views
ADD COMMENT
0
Entering edit mode

Is this the last line of the chr11.vcf file? If so it might be because of corrupted file or that you ran out of space when you try to write it. It seems like you are missing an awful lot of fields e.g. ~1000

ADD REPLY
0
Entering edit mode

I'm not sure if it really is the last line (might be though), I couldn't open the un-splitted file yet due to it's enormous size; nonetheless I'm confused where I should run out of space - it worked fine before, and I'm doing nothing different... Yes, the error says like ~1000 fields are missing, I just don't know why/where, and how to cope with this..

ADD REPLY
1
Entering edit mode

If that happened, the only thing you can do is to re-run the script. A quick check of the file can be using ls -lh to see if the size of the file is correct.

ADD REPLY
0
Entering edit mode

Thanks for that tip with the size check - the size is indeed smaller than it should be according to the given download link, so I guess it's just a download error! Thank you again!

ADD REPLY
0
Entering edit mode
6.8 years ago

I have had the exact same issue while trying to split vcf file by sample. It is a large vcf file ( ~9million lines) and seems to get stuck at around 400,000. None of my output files have anything past chromosome 1.

I've tried to check the formatting of the file (column number) and it looks consistent. It seems to stop in the same place each time though.

vcf-subset -c C25P GW_full_PASS.vcf.gz > C25P.vcf
Wrong number of fieldsin GW_full_PASS.vcf.gz; expected 33, got 29. The offending line was:
[1      177704242       rs10913414      A       G       2965    PASS    .       GT:GL:GOF:GQ:NR:NV      0/0:0,-6
.32,-81.8:37:63:22:0    0/0:0,-6.32,-81.2:63:63:22:0    0/0:0,-11.11,-135.3:33:99:38:0  0/0:0,-6.75,-95.9:72:68:
27:1    0/0:0,-9.63,-123.5:20:96:33:0   0/1:-29.38,0,-36.68:41:99:19:9  0/0:0,-10.24,-129.7:67:99:33:0  0/0:0,-9
.91,-121.2:44:99:36:0   0/0:0,-5.99,-75.3:51:60:21:0    0/1:-51.88,0,-13.58:18:99:21:16 0/1:-42.38,0,-82.08:68:9
9:36:12 0/0:0,-5.12,-66.5:49:51:16:0    0/1:-41.16,0,-62.66:46:99:35:14 0/1:-13.98,0,-40.18:54:99:16:5  0/1:-64.
28,0,-39.78:59:99:34:20 0/1:-30.68,0,-36.88:48:99:22:9  0/1:-61.29,0,-50.09:39:99:38:21 0/0:0,-6.3,-76.8:37:63:22:0     0/0:0,-6.3,-96.5:20:63:28:1     0/]

But the actual line looks normal (checked it in notepad++), it just seems to have decided to chop off some columns:

cat GW_full_PASS.vcf | grep rs10913414
1       177704242       rs10913414      A       G       2965    PASS    .       GT:GL:GOF:GQ:NR:NV      0/0:0,-6.32,-81.8:37:63:22:0    0/0:0,-6.32,-81.2:63:63:22:0    0/0:0,-11.11,-135.3:33:99:38:0  0/0:0,-6.75,-95.9:72:68:
27:1    0/0:0,-9.63,-123.5:20:96:33:0   0/1:-29.38,0,-36.68:41:99:19:9  0/0:0,-10.24,-129.7:67:99:33:0  0/0:0,-9
.91,-121.2:44:99:36:0   0/0:0,-5.99,-75.3:51:60:21:0    0/1:-51.88,0,-13.58:18:99:21:16 0/1:-42.38,0,-82.08:68:9
9:36:12 0/0:0,-5.12,-66.5:49:51:16:0    0/1:-41.16,0,-62.66:46:99:35:14 0/1:-13.98,0,-40.18:54:99:16:5  0/1:-64.
28,0,-39.78:59:99:34:20 0/1:-30.68,0,-36.88:48:99:22:9  0/1:-61.29,0,-50.09:39:99:38:21 0/0:0,-6.3,-76.8:37:63:2
2:0     0/0:0,-6.3,-96.5:20:63:28:1     0/1:-50.46,0,-67.86:38:99:39:18 0/0:0,-8.13,-107:73:81:26:0     0/0:0,-6.02,-81.2:20:60:20:0    1/1:-103.6,-8.13,0:32:81:28:28  1/1:-126.2,-9.91,0:37:99:33:33
ADD COMMENT
0
Entering edit mode
6.8 years ago

I have just answered my own question, but this error is because the bzip of the input vcf file didn't work properly. Not sure why this happened, exactly, but it is fixed now :-D

ADD COMMENT
0
Entering edit mode

Can you elaborate how to fix this? Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6