Question

gunzip error: trailing garbage ignored

0

Entering edit mode

12 months ago

pwjeffries • 0

I cannot extract all of the contents of a gzipped vcf file. The file is part of an encrypted tarball I downloaded from dbGaP. After decryption, I was able to extract a directory of files with this command:

 tar -xvf phg001.tar

When I used Plink to convert one of the extracted vcf files to a bed file, I got an error message: Error: Line 20 of .vcf file has fewer tokens than expected.

I counted the number of lines in the files with the help of zcat.

zcat chr22-filtered.dose.vcf.gz | wc -l

Output:

gzip: chr22-filtered.dose.vcf.gz: decompression OK, trailing garbage ignored
19

And if I try to unzip the file, I get a similar message about trailing garbage.

gzip: test22.vcf.gz: decompression OK, trailing garbage ignored

The file is too large to have only 20 lines, and if I count the number of lines without using zcat, there is indeed more to the file.

wc -l chr22-filtered.dose.vcf.gz
3632730 chr22-filtered.dose.vcf.gz

How can I extract all of the contents of the zipped file.

All advice is appreciated.
Paul

vcf gzip gz plink • 2.1k views

ADD COMMENT • link updated 12 months ago by Mensur Dlakic ★ 27k • written 12 months ago by pwjeffries • 0

0

Entering edit mode

Most likely an error happened during a download, and you will need to get the file again. If that's not it, there was an error either when creating the file or uploading it.

To the best of my knowledge there is no magical command to cleanly decompress the file that has errors in it.

ADD REPLY • link 12 months ago by Mensur Dlakic ★ 27k