0
0
Entering edit mode
9 months ago
Ak ▴ 60

Hi! I'm trying to check the quality of my raw genome reads using fastqc, but I'm encountering this issue. Does anyone what can I do to progress with my analysis? Thanks!

\$fastqc EtenNg5_ACAGTG_L004_R1_001.fastq.gz
Started analysis of EtenNg5_ACAGTG_L004_R1_001.fastq.gz
Failed to process file EtenNg5_ACAGTG_L004_R1_001.fastq.gz
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)

genome fastqc • 1.1k views
2
Entering edit mode

ddi you look at this error: uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'?

Try two things:

1. seqkit seq -n <failed.fastq.gz>. See if this works and prints all the headers. This should fail due to error above unless seqkit is header (ID) error tolerant.
1. gunzip failed.fastq.gz and run sed -nr '1~4p' failed.fq | grep -v "@". This should print headers without @. You can add @ to the headers. you can also do zcat failed.fq.gz| sed -nr '1~4p' | grep "@" to list the headers/IDs without @
0
Entering edit mode

For the 1st one, I got this at the end of the list

And for the 2nd method, I wasn't able to gunzip the file to run sed

So I've tried zcat that command and got this

0
Entering edit mode

It's only the read headers, something is very wrong here. How did you obtain these files?

0
Entering edit mode

I got it from the sequencing company

3
Entering edit mode

Somewhere along the way these data files appear to have become corrupt. If you are able to then download a new copy.

0
Entering edit mode

0
Entering edit mode

Both the commands I asked him/her to run prints only headers and probably because of that, screenshots have only headers. I was checking if the file has any headers without @. However, it seems file is corrupted.

0
Entering edit mode

sorry..there was a type error: zcat failed.fq.gz| sed -nr '1~4p' | grep "@" should be zcat failed.fq.gz| sed -nr '1~4p' | grep -v "@". However, your CRC error seems to stem from corrupt file. Could you ask for md5sum files from your core for the files you have? Generate MD5sums for the files you have and compare it with MD5sums provided by core. If they do not match, request data from core again.

0
Entering edit mode

or FastQC can't take gzipped files as input or your fastq files are not correctly formatted it seems.

0
Entering edit mode

Most probably issue with the fastq file, because I did the read 2 for this genome similarly and it was fine. So, seems like there's no other way to resolve this issue?

0
Entering edit mode

Output of file EtenNg5_ACAGTG_L004_R1_001.fastq.gz?

0
Entering edit mode

Oh I usually just did the command without stating the output file. Only 'fastqc' and its 'input file'

0
Entering edit mode

Please just type file EtenNg5_ACAGTG_L004_R1_001.fastq.gz into your terminal and paste the output here. This command checks whether the file is compressed or not.

0
Entering edit mode

Oh sorry I misunderstood you. I got this:

EtenNg5_ACAGTG_L004_R1_001.fastq.gz: gzip compressed data, max speed