Fastq file is truncated error message
0
0
Entering edit mode
6.8 years ago
rob.costa1234 ▴ 310

I merged several fastq files for same sample (cat) and when I ran Fastq it gives me error that file is truncated. when I looked at tail

gzip: ABC_10_XXX_ACCTCA_L001_R2_merged.fastq.gz: unexpected end of file
+
0<0<BFFFFF0<BFF00BBFFFFIIBFFFI7B7B7<<<B<BBBBB7'00'7<7BBB0<B0''07<07<BB##############################
@HWI-ST1148:230:HAJHGAVXX:2:2106:12786:42111 2:N:0:ACCTCA
CTCTTCCGATATTTACACGGAAGAGAGGAGGATAGTTATACGGATCCGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGGATCATTA
+
<'0<BBB7'<0<000'B<BB<<<BBBB<<'070<7''07<<'''77'7077BB<<BB###########################################
ATAAGTTCCATTCAATACCATTTCTTTTGAGTCCATTCTATCTGATTCCATTCCCTTTGATTCCATAACATTTGAGTCCATTCAATTCCATTCCTTTCGT
+
BBBFF<BFFFFF<FF0BFFIIF0BFFFFF<BFFFB00<BFF'0<


Then I also looked at that each file I have used to cat same error message in FAstqC Is there any way I can rescue data or what may be the reason of this error message? Thanks

RNA-Seq error fastq • 12k views
1
Entering edit mode

what's the size of the file , what's your OS , what's your filesystem <- https://en.wikipedia.org/wiki/Large_file_support

0
Entering edit mode

It does indeed look like the file is truncated. The only thing you can do is download it again and hope the original one online is OK.

0
Entering edit mode

Total is Appx 800 MB (400+350). I used linux and file is in Gz format. This sequencing was done by a core and we again downloaded the file.

0
Entering edit mode

You merged them when compressed or before?

0
Entering edit mode

I used cat to merge all R1 / R2 gz files to out put merged gz.

1
Entering edit mode

You could try to repeat that, and make sure no errors or interruptions happen in that process. If that doesn't solve it you probably, as Devon wrote, should try to access the original file.

0
Entering edit mode

Thanks for the input. I think I can either use md5sum or gzip, However will either one allow me to check on all the files in a directory. otherwise it may be too complicated to do it on each file. I googled it but could not find a way out using either tool in batch/ directory. All individual files are in gz.

2
Entering edit mode

You can md5sum *.gz or md5sum -c checksums_file. To use gzip on all of the files:

for f in *.gz; do
echo $f gzip -t$f
done

0
Entering edit mode

Have a look at the individual files then and see if only one is corrupt. If none are, just concatenate them again. BTW, there's a gzip option that tests for integrity.

0
Entering edit mode

So I ran md5Sum on each single file as well as merged files- I got few files in which FASTQC (version-0.11.5) gives me errors as follow screen shots, while these files pass md5sum test. These are five files which even from re-download from source give me error in FASTQc but pass md5Sum. Any suggestion from experts will be helpful.

Failed to process file: Ran out of data in the middle of a fastq entry. Your file is probably truncated
Failed to process file: Ran out of data in the middle of a fastq entry. Your file is probably truncated

1
Entering edit mode

Since the re-downloaded files are still generating an error you would now want to ask the owners to look into providing new copies and/or testing the files at source to see if the corruption is in original data.

0
Entering edit mode

However, I could not understand md5sum suggest file is valid but FAStQc throws error. I have even used gzip just to make sure files are valid and not corrupted. Any insight from this perspective will be helpful.

1
Entering edit mode

md5sum is format agnostic and does not know/care that the file is in fastq format. So FastQC (which is checking for valid fastq format) throws an error but md5sum does not.