Entering edit mode
13 months ago
Ak ▴ 60
Hi! I'm trying to check the quality of my raw genome reads using fastqc, but I'm encountering this issue. Does anyone what can I do to progress with my analysis? Thanks!
$fastqc EtenNg5_ACAGTG_L004_R1_001.fastq.gz Started analysis of EtenNg5_ACAGTG_L004_R1_001.fastq.gz Failed to process file EtenNg5_ACAGTG_L004_R1_001.fastq.gz uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@' at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158) at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76) at java.base/java.lang.Thread.run(Thread.java:832)
ddi you look at this error:
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'?
Try two things:
sed -nr '1~4p' failed.fq | grep -v "@". This should print headers without
@. You can add
@to the headers. you can also do
zcat failed.fq.gz| sed -nr '1~4p' | grep "@"to list the headers/IDs without
For the 1st one, I got this at the end of the list
And for the 2nd method, I wasn't able to gunzip the file to run sed
So I've tried zcat that command and got this
It's only the read headers, something is very wrong here. How did you obtain these files?
I got it from the sequencing company
Somewhere along the way these data files appear to have become corrupt. If you are able to then download a new copy.
Ah, my bad 😅
Both the commands I asked him/her to run prints only headers and probably because of that, screenshots have only headers. I was checking if the file has any headers without
@. However, it seems file is corrupted.
sorry..there was a type error:
zcat failed.fq.gz| sed -nr '1~4p' | grep "@"should be
zcat failed.fq.gz| sed -nr '1~4p' | grep -v "@". However, your CRC error seems to stem from corrupt file. Could you ask for md5sum files from your core for the files you have? Generate MD5sums for the files you have and compare it with MD5sums provided by core. If they do not match, request data from core again.
or FastQC can't take gzipped files as input or your fastq files are not correctly formatted it seems.
Most probably issue with the fastq file, because I did the read 2 for this genome similarly and it was fine. So, seems like there's no other way to resolve this issue?
Oh I usually just did the command without stating the output file. Only 'fastqc' and its 'input file'
Please just type
file EtenNg5_ACAGTG_L004_R1_001.fastq.gzinto your terminal and paste the output here. This command checks whether the file is compressed or not.
Oh sorry I misunderstood you. I got this: