Fastq file is truncated error message
0
0
Entering edit mode
8.5 years ago
rob.costa1234 ▴ 310

I merged several fastq files for same sample (cat) and when I ran Fastq it gives me error that file is truncated. when I looked at tail

gzip: ABC_10_XXX_ACCTCA_L001_R2_merged.fastq.gz: unexpected end of file
+
0<0<BFFFFF0<BFF00BBFFFFIIBFFFI7B7B7<<<B<BBBBB7'00'7<7BBB0<B0''07<07<BB##############################
@HWI-ST1148:230:HAJHGAVXX:2:2106:12786:42111 2:N:0:ACCTCA
CTCTTCCGATATTTACACGGAAGAGAGGAGGATAGTTATACGGATCCGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGGATCATTA
+
<'0<BBB7'<0<000'B<BB<<<BBBB<<'070<7''07<<'''77'7077BB<<BB###########################################
@HWI-ST1148:230:HAJHGADXX:2:2106:12909:42165 2:N:0:ACCTCA
ATAAGTTCCATTCAATACCATTTCTTTTGAGTCCATTCTATCTGATTCCATTCCCTTTGATTCCATAACATTTGAGTCCATTCAATTCCATTCCTTTCGT
+
BBBFF<BFFFFF<FF0BFFIIF0BFFFFF<BFFFB00<BFF'0<

Then I also looked at that each file I have used to cat same error message in FAstqC Is there any way I can rescue data or what may be the reason of this error message? Thanks

RNA-Seq error fastq • 16k views
ADD COMMENT
1
Entering edit mode

what's the size of the file , what's your OS , what's your filesystem <- https://en.wikipedia.org/wiki/Large_file_support

ADD REPLY
0
Entering edit mode

It does indeed look like the file is truncated. The only thing you can do is download it again and hope the original one online is OK.

ADD REPLY
0
Entering edit mode

Total is Appx 800 MB (400+350). I used linux and file is in Gz format. This sequencing was done by a core and we again downloaded the file.

ADD REPLY
0
Entering edit mode

You merged them when compressed or before?

ADD REPLY
0
Entering edit mode

I used cat to merge all R1 / R2 gz files to out put merged gz.

ADD REPLY
1
Entering edit mode

You could try to repeat that, and make sure no errors or interruptions happen in that process. If that doesn't solve it you probably, as Devon wrote, should try to access the original file.

ADD REPLY
0
Entering edit mode

Thanks for the input. I think I can either use md5sum or gzip, However will either one allow me to check on all the files in a directory. otherwise it may be too complicated to do it on each file. I googled it but could not find a way out using either tool in batch/ directory. All individual files are in gz.

ADD REPLY
2
Entering edit mode

You can md5sum *.gz or md5sum -c checksums_file. To use gzip on all of the files:

for f in *.gz; do
    echo $f
    gzip -t $f
done
ADD REPLY
0
Entering edit mode

Have a look at the individual files then and see if only one is corrupt. If none are, just concatenate them again. BTW, there's a gzip option that tests for integrity.

ADD REPLY
0
Entering edit mode

So I ran md5Sum on each single file as well as merged files- I got few files in which FASTQC (version-0.11.5) gives me errors as follow screen shots, while these files pass md5sum test. These are five files which even from re-download from source give me error in FASTQc but pass md5Sum. Any suggestion from experts will be helpful.

Failed to process file: Ran out of data in the middle of a fastq entry. Your file is probably truncated
Failed to process file: Midline 'AAAAAAFIIIIIAAATBFFFFFAATAFIIIIIAABFFFFFACGAFIFIIII didn't start with +
Failed to process file: Midline @HWI-ST1148:191:H…..XX.2:109:11711 1N:0:ACTAG didn't start with +
Failed to process file:Midline 'AGBFFTGA' didn't start with +
Failed to process file: Ran out of data in the middle of a fastq entry. Your file is probably truncated
ADD REPLY
1
Entering edit mode

Since the re-downloaded files are still generating an error you would now want to ask the owners to look into providing new copies and/or testing the files at source to see if the corruption is in original data.

ADD REPLY
0
Entering edit mode

However, I could not understand md5sum suggest file is valid but FAStQc throws error. I have even used gzip just to make sure files are valid and not corrupted. Any insight from this perspective will be helpful.

ADD REPLY
1
Entering edit mode

md5sum is format agnostic and does not know/care that the file is in fastq format. So FastQC (which is checking for valid fastq format) throws an error but md5sum does not.

ADD REPLY

Login before adding your answer.

Traffic: 2470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6