fastq.gz unexpected end of file: Can this be fixed?
0
0
Entering edit mode
2.9 years ago
rebeliscu ▴ 60

Something appears to be wrong with one of my fastq files: Blood_ACAGTG_L002_R2_010.fastq.gz

I first noticed an error when trying to trim this file (with its R1 counterpart) with trimmomatic:

java -jar /home/shared/programs/Trimmomatic-0.39/trimmomatic-0.39.jar PE -threads 15 Blood_ACAGTG_L002_R1_010.fastq.gz Blood_ACAGTG_L002_R2_010.fastq.gz /mnt/bdata/shared/SF10711_exome/gbm_14_009_trimmed.fastq.gz ILLUMINACLIP:NexteraPE-PE.fa:2:30:12 LEADING:8 TRAILING:8 SLIDINGWINDOW:4:20 MINLEN:60

java.io.EOFException: Unexpected end of ZLIB input stream
at java.base/java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:245)
at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:159)
at java.base/java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118)
at org.usadellab.trimmomatic.util.ConcatGZIPInputStream.read(ConcatGZIPInputStream.java:73)
at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:181)
at java.base/java.io.BufferedReader.fill(BufferedReader.java:161)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:326)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392)
at org.usadellab.trimmomatic.fastq.FastqParser.parseOne(FastqParser.java:71)
at org.usadellab.trimmomatic.fastq.FastqParser.next(FastqParser.java:179)
at org.usadellab.trimmomatic.threading.ParserWorker.run(ParserWorker.java:42)
at java.base/java.lang.Thread.run(Thread.java:829)
Exception in thread "Thread-1" java.lang.RuntimeException: java.io.EOFException: Unexpected end of ZLIB input stream
at org.usadellab.trimmomatic.threading.ParserWorker.run(ParserWorker.java:56)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
at java.base/java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:245)
at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:159)
at java.base/java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118)
at org.usadellab.trimmomatic.util.ConcatGZIPInputStream.read(ConcatGZIPInputStream.java:73)
at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:181)
at java.base/java.io.BufferedReader.fill(BufferedReader.java:161)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:326)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392)
at org.usadellab.trimmomatic.fastq.FastqParser.parseOne(FastqParser.java:71)
at org.usadellab.trimmomatic.fastq.FastqParser.next(FastqParser.java:179)
at org.usadellab.trimmomatic.threading.ParserWorker.run(ParserWorker.java:42)
... 1 more
Input Read Pairs: 3860000 Both Surviving: 3102127 (80.37%) Forward Only Surviving: 456443 (11.82%) Reverse Only Surviving: 125247 (3.24%) Dropped: 176183 (4.56%)
TrimmomaticPE: Completed successfully

Looking into this error, I was lead to this thread: Error: Help understand Trimmomatic ZLIB input stream error

Trying to unzip the file, I get an 'unexpected end of file' error.

When I try to view the contents:

zcat Blood_ACAGTG_L002_R2_010.fastq.gz | tail

gzip: Blood_ACAGTG_L002_R2_010.fastq.gz: unexpected end of file
@HWI-D00328:58:H7EAEADXX:2:2215:11524:35696 2:N:0:ACAGTG
ATCTTGCCCTGCCGCACTGACTACGGCTGCTGCCGCCTTTCTATGGCTGTGCGTCTCATCCCCGCTGTCCATCTGGGAGATGGGGTCTTCCTTGTGGCGCC
+
  CCCFFFFFHHHGHJJJJBIGJGIIJJJJJJFI9BFFHIJIIGGGIIGEGE;AA?B>CDEEEDD'3=BBCDAFDCDDD2<5?CCBD9<C:@CCDDAC@BD@B
@HWI-D00328:58:H7EAEADXX:2:2215:11723:35707 2:N:0:ACAGTG
TAGATTGTTAGAAAGATCCAAGTATTAAGATCTAGGGTGGCTAACTTTTCACAGACAAAAAGCTTGTTTGTAAGGTCATTTACTATACCCTTAATTCAGGA
+
==+2<@AAB?<A?BBBBB9+3=34>A,>CB4?=AC?9110;AA>ABBBB7*=AA3=>BBB2;>3A76>>BBABAA=7>?@@@>>@>@@B>>=;;?>B=;?3
@HWI-D00328:58:H7EAEADXX:2:2215:11603:35719 2:N:0:ACAGTG

When I do the same for a different fastq, working file, we have:

zcat Blood_ACAGTG_L002_R1_010.fastq.gz | tail

+
  @@BFFFFFHHHHHJJJIIIJCHIIJEGHIJGJJGHJJIIJJJJJJJFGHIJJJJJJEHJJJJIJHHHFFFFFE>>>BCDDBCCDDDDDDDDDDDDC9CCDC
@HWI-D00328:58:H7EAEADXX:2:2215:18033:58714 1:N:0:ACAGTG
CTTCTTTCCTTTTAGGTGGTTCTAGATGTTGGTTGTGGATCAGGAATCCTGTCATTTTTTGCTGTACAGGCTGGAGCTAGGACAGTTTATGCAGTTGAAGC
+
  @@@FFFFFHFGHH>FG<CFCEDHHCHHGHCGGCGEHCGGGHBFH@?GHDHEDFGIGI@DHHGIJGIG;?ECDBB66;A@>?CC=B@CDC>CD5::AC>>>@
  @HWI-D00328:58:H7EAEADXX:2:2215:18170:58720 1:N:0:ACAGTG
GCAAAGTAGTCAGGAATCGATCTCGTGAAGCCCGCAAGGACCGAACACCCCCACCCCGATTTAGACCTACGGGTGCTGCCCCATGTCTCCCACCAAAGCCC
+
  ?@<DDD2A?=CDF@CGBFGICGIFGF@AE<FFFFIIFFBDFD:AFEEEC4ABDC<@BBBBB?BBBBBBBB9>B9<?@BB?@@B9?(:@@AA?BBB(<39?<

Is there anything obvious the differs between the ends of these two files that can be manually fixed?

Thanks in advance!

corrup fastq • 4.7k views
ADD COMMENT
1
Entering edit mode

You probably have data files that are corrupt. You should re-download them if you can. You could try to fix them but after you fix one error there may be another so likely not worth the hassle.

ADD REPLY
0
Entering edit mode

I agree that this is best fixed by downloading the file again because your file is truncated. The most common reasons for truncations are running out of disk space and broken connection. Also, sequencing centers tend to delete files after a few months, so getting the downloads right is extra important. My recommendations: always download important files on the command-line using scp, aspera, wget, curl or another reliable download manager, depending on the provided internet-protocol. Do check file checksums like md5, sha..., if provided. If in doubt, prefer a sequencing centre that provides convenient methods of data transfer.

ADD REPLY

Login before adding your answer.

Traffic: 2639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6