Entering edit mode
8.5 years ago
rob.costa1234
▴
310
I merged several fastq files for same sample (cat) and when I ran Fastq it gives me error that file is truncated. when I looked at tail
gzip: ABC_10_XXX_ACCTCA_L001_R2_merged.fastq.gz: unexpected end of file
+
0<0<BFFFFF0<BFF00BBFFFFIIBFFFI7B7B7<<<B<BBBBB7'00'7<7BBB0<B0''07<07<BB##############################
@HWI-ST1148:230:HAJHGAVXX:2:2106:12786:42111 2:N:0:ACCTCA
CTCTTCCGATATTTACACGGAAGAGAGGAGGATAGTTATACGGATCCGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGGATCATTA
+
<'0<BBB7'<0<000'B<BB<<<BBBB<<'070<7''07<<'''77'7077BB<<BB###########################################
@HWI-ST1148:230:HAJHGADXX:2:2106:12909:42165 2:N:0:ACCTCA
ATAAGTTCCATTCAATACCATTTCTTTTGAGTCCATTCTATCTGATTCCATTCCCTTTGATTCCATAACATTTGAGTCCATTCAATTCCATTCCTTTCGT
+
BBBFF<BFFFFF<FF0BFFIIF0BFFFFF<BFFFB00<BFF'0<
Then I also looked at that each file I have used to cat same error message in FAstqC Is there any way I can rescue data or what may be the reason of this error message? Thanks
what's the size of the file , what's your OS , what's your filesystem <- https://en.wikipedia.org/wiki/Large_file_support
It does indeed look like the file is truncated. The only thing you can do is download it again and hope the original one online is OK.
Total is Appx 800 MB (400+350). I used linux and file is in Gz format. This sequencing was done by a core and we again downloaded the file.
You merged them when compressed or before?
I used cat to merge all R1 / R2 gz files to out put merged gz.
You could try to repeat that, and make sure no errors or interruptions happen in that process. If that doesn't solve it you probably, as Devon wrote, should try to access the original file.
Thanks for the input. I think I can either use md5sum or gzip, However will either one allow me to check on all the files in a directory. otherwise it may be too complicated to do it on each file. I googled it but could not find a way out using either tool in batch/ directory. All individual files are in gz.
You can
md5sum *.gz
ormd5sum -c checksums_file
. To use gzip on all of the files:Have a look at the individual files then and see if only one is corrupt. If none are, just concatenate them again. BTW, there's a gzip option that tests for integrity.
So I ran md5Sum on each single file as well as merged files- I got few files in which FASTQC (version-0.11.5) gives me errors as follow screen shots, while these files pass md5sum test. These are five files which even from re-download from source give me error in FASTQc but pass md5Sum. Any suggestion from experts will be helpful.
Since the re-downloaded files are still generating an error you would now want to ask the owners to look into providing new copies and/or testing the files at source to see if the corruption is in original data.
However, I could not understand md5sum suggest file is valid but FAStQc throws error. I have even used gzip just to make sure files are valid and not corrupted. Any insight from this perspective will be helpful.
md5sum is format agnostic and does not know/care that the file is in fastq format. So FastQC (which is checking for valid fastq format) throws an error but md5sum does not.