I cannot extract all of the contents of a gzipped vcf file. The file is part of an encrypted tarball I downloaded from dbGaP. After decryption, I was able to extract a directory of files with this command:
tar -xvf phg001.tar
When I used Plink to convert one of the extracted vcf files to a bed file, I got an error message: Error: Line 20 of .vcf file has fewer tokens than expected.
I counted the number of lines in the files with the help of zcat.
zcat chr22-filtered.dose.vcf.gz | wc -l
Output:
gzip: chr22-filtered.dose.vcf.gz: decompression OK, trailing garbage ignored
19
And if I try to unzip the file, I get a similar message about trailing garbage.
gzip: test22.vcf.gz: decompression OK, trailing garbage ignored
The file is too large to have only 20 lines, and if I count the number of lines without using zcat, there is indeed more to the file.
wc -l chr22-filtered.dose.vcf.gz
3632730 chr22-filtered.dose.vcf.gz
How can I extract all of the contents of the zipped file.
All advice is appreciated.
Paul
Most likely an error happened during a download, and you will need to get the file again. If that's not it, there was an error either when creating the file or uploading it.
To the best of my knowledge there is no magical command to cleanly decompress the file that has errors in it.