Question: fastq.gz goes into gz cpgz loop
0
gravatar for S0phia
3 months ago by
S0phia0
USA
S0phia0 wrote:

We used a sequencing service and they gave us a 100GB tar file to download. After downloading, I checked the md5sum code and it matches theirs. But after I unzip the tar file and find fastq.gz files inside a folder, I tried gunzip -c filename.fastq.gz | head, I get "not in gzip format" error. I tried file filename.fastq.gz, it says "data" (not gzip compressed data as I would expect). When I just double click on a fastq.gz file, it goes into gz cpgz loop. Is it possible that they gave us corrupt files?

rna-seq • 314 views
ADD COMMENTlink modified 12 weeks ago • written 3 months ago by S0phia0

What's the output of file xxx.tar and file xxxx.gz.

You can also ask for help from the service provider.

ADD REPLYlink written 3 months ago by shenwei3563.4k

POSIX tar archive (GNU) and data. I've contacted them but no answer so far... Just wanted to ask here to see what else I can try... Thank you.

ADD REPLYlink written 3 months ago by S0phia0

How did you unzip the tar file and on what OS?

It is possible that you may have corrupted the file during download, in case you did not download it in binary mode. Since md5sum is ok that possibility is slim though.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax39k

I just double clicked the tar file on my mac Sierra... maybe I should try unzipping it in other ways. Thank you.

ADD REPLYlink written 3 months ago by S0phia0

Is the file gzipped at all? Probably file extension is .fastq.gz, but it is simply a fastq file. Try to do head -4 filename.fastq.gz. Also check gzip integrity (gzip -tv <input.gz>) and CRC integrity (gzip -lv <input.gz>). File command output data denotes that File command is not able to determine the content of the file.

ADD REPLYlink modified 3 months ago • written 3 months ago by cpad01123.8k

I tried to see the content by using head command, and it shows some gibberish (lots of question marks, some numbers and alphabets.) I tried changing the file extension to filename.fastq to see what happens, and it still gives me gibberish. As for the other commands you suggested, I get gzip: filename.fastq.gz: not in gzip format, finename.fastq.gz: NOT OK, and not in gzip format. I guess at this point it's clear that the files I've got are not gzip files even though the name looks like it. Thank you very much for your help.

ADD REPLYlink written 3 months ago by S0phia0

could you please paste the result of

file xxxx.gz
ADD REPLYlink written 3 months ago by shenwei3563.4k

This is exactly what it says:

filename.fastq.gz: data
ADD REPLYlink written 3 months ago by S0phia0

Since you are on MacOS, try unarchiver in app store. It is supposed to handle several formats including cpgz. My guess (from googling) that you might have run into the problem explained here: http://osxdaily.com/2013/02/13/open-zip-cpgz-file/. Let us know if any one of the methods works, for future reference.

ADD REPLYlink modified 3 months ago • written 3 months ago by cpad01123.8k
0
gravatar for S0phia
12 weeks ago by
S0phia0
USA
S0phia0 wrote:

Finally. I wanted to leave an update here so others who run into this problem might use this post as a reference. I heard back from the sequencing providers and they had to re-do the fastq files (I'm not sure exactly what they had to re-do, but that's what they told me). The tar file size was half the original, and the fastq.gz files all behave normal (I could simply double click on one and it turned into a readable fastq file, and file command returned gzip compressed data, extra field on all the files). I guess their gunzip process went wrong the first time around. Thank you so much for all your help.

ADD COMMENTlink written 12 weeks ago by S0phia0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1382 users visited in the last hour