Question

Fastq file is truncated error message

0

Entering edit mode

7.9 years ago

rob.costa1234 ▴ 310

I merged several fastq files for same sample (cat) and when I ran Fastq it gives me error that file is truncated. when I looked at tail

gzip: ABC_10_XXX_ACCTCA_L001_R2_merged.fastq.gz: unexpected end of file
+
0<0<BFFFFF0<BFF00BBFFFFIIBFFFI7B7B7<<<B<BBBBB7'00'7<7BBB0<B0''07<07<BB##############################
@HWI-ST1148:230:HAJHGAVXX:2:2106:12786:42111 2:N:0:ACCTCA
CTCTTCCGATATTTACACGGAAGAGAGGAGGATAGTTATACGGATCCGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGGATCATTA
+
<'0<BBB7'<0<000'B<BB<<<BBBB<<'070<7''07<<'''77'7077BB<<BB###########################################
@HWI-ST1148:230:HAJHGADXX:2:2106:12909:42165 2:N:0:ACCTCA
ATAAGTTCCATTCAATACCATTTCTTTTGAGTCCATTCTATCTGATTCCATTCCCTTTGATTCCATAACATTTGAGTCCATTCAATTCCATTCCTTTCGT
+
BBBFF<BFFFFF<FF0BFFIIF0BFFFFF<BFFFB00<BFF'0<

Then I also looked at that each file I have used to cat same error message in FAstqC Is there any way I can rescue data or what may be the reason of this error message? Thanks

RNA-Seq error fastq • 15k views

ADD COMMENT • link 7.9 years ago by rob.costa1234 ▴ 310

1

Entering edit mode

what's the size of the file , what's your OS , what's your filesystem <- https://en.wikipedia.org/wiki/Large_file_support

ADD REPLY • link 7.9 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

It does indeed look like the file is truncated. The only thing you can do is download it again and hope the original one online is OK.

ADD REPLY • link 7.9 years ago by Devon Ryan 104k

0

Entering edit mode

Total is Appx 800 MB (400+350). I used linux and file is in Gz format. This sequencing was done by a core and we again downloaded the file.

ADD REPLY • link 7.9 years ago by rob.costa1234 ▴ 310

0

Entering edit mode

You merged them when compressed or before?

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

I used cat to merge all R1 / R2 gz files to out put merged gz.

ADD REPLY • link 7.9 years ago by rob.costa1234 ▴ 310

1

Entering edit mode

You could try to repeat that, and make sure no errors or interruptions happen in that process. If that doesn't solve it you probably, as Devon wrote, should try to access the original file.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks for the input. I think I can either use md5sum or gzip, However will either one allow me to check on all the files in a directory. otherwise it may be too complicated to do it on each file. I googled it but could not find a way out using either tool in batch/ directory. All individual files are in gz.

ADD REPLY • link 7.9 years ago by rob.costa1234 ▴ 310

2

Entering edit mode

You can md5sum *.gz or md5sum -c checksums_file. To use gzip on all of the files:

for f in *.gz; do
    echo $f
    gzip -t $f
done

ADD REPLY • link 7.9 years ago by Devon Ryan 104k

0

Entering edit mode

Have a look at the individual files then and see if only one is corrupt. If none are, just concatenate them again. BTW, there's a gzip option that tests for integrity.

ADD REPLY • link 7.9 years ago by Devon Ryan 104k

0

Entering edit mode

So I ran md5Sum on each single file as well as merged files- I got few files in which FASTQC (version-0.11.5) gives me errors as follow screen shots, while these files pass md5sum test. These are five files which even from re-download from source give me error in FASTQc but pass md5Sum. Any suggestion from experts will be helpful.

Failed to process file: Ran out of data in the middle of a fastq entry. Your file is probably truncated
Failed to process file: Midline 'AAAAAAFIIIIIAAATBFFFFFAATAFIIIIIAABFFFFFACGAFIFIIII didn't start with +
Failed to process file: Midline @HWI-ST1148:191:H…..XX.2:109:11711 1N:0:ACTAG didn't start with +
Failed to process file:Midline 'AGBFFTGA' didn't start with +
Failed to process file: Ran out of data in the middle of a fastq entry. Your file is probably truncated

ADD REPLY • link updated 7.9 years ago by GenoMax 142k • written 7.9 years ago by rob.costa1234 ▴ 310

1

Entering edit mode

Since the re-downloaded files are still generating an error you would now want to ask the owners to look into providing new copies and/or testing the files at source to see if the corruption is in original data.

ADD REPLY • link 7.9 years ago by GenoMax 142k

0

Entering edit mode

However, I could not understand md5sum suggest file is valid but FAStQc throws error. I have even used gzip just to make sure files are valid and not corrupted. Any insight from this perspective will be helpful.

ADD REPLY • link 7.9 years ago by rob.costa1234 ▴ 310

1

Entering edit mode

md5sum is format agnostic and does not know/care that the file is in fastq format. So FastQC (which is checking for valid fastq format) throws an error but md5sum does not.

ADD REPLY • link 7.9 years ago by GenoMax 142k