How to repair corrupted fastq files after sortmeRNA
1
0
Entering edit mode
3.2 years ago
SMILE ▴ 160

Hi all, After removing rRNA in the fastq files with sortmeRNA, one of the paied reads was corrupetd, which failed to do fastqc with error:

uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)


I checked the lines of the two paired reads after sortmeRNA, and found one of the paired reads had two more lines than the other.

wc -l S3-sortmerna_1.fq S3-sortmerna_2.fq

**210133674 S3-sortmerna_1.fq**

**210133672 S3-sortmerna_2.fq**


Can someone explain the reason why this happed and give me some advice how to repair the fastq file?

Below are the command lines I used to do the sortmeRNA and fastqc

sortmerna --ref \$REF --reads ./S3-interleaved.fq --sam --num_alignments 1 --fastx --align ed ./S3_rRNA --other ./S3_non_rRNA --log -v --paired_in

unmerge-paired-reads.sh ./S3_non_rRNA.fq ./S3-sortmerna_1.fq ./S3-sortmerna_2.fq

fastqc /S3-sortmerna_1.fq ./S3-sortmerna_2.fq

*Started analysis of S3-sortmerna_1.fq*

*Approx 5% complete for S3-sortmerna_1.fq*

.

.

.

*Approx 95% complete for S3-sortmerna_1.fq*

*Analysis complete for S3-sortmerna_1.fq*

*Started analysis of S3-sortmerna_2.fq*

*Approx 5% complete for S3-sortmerna_2.fq*

*Approx 10% complete for S3-sortmerna_2.fq*

*Approx 15% complete for S3-sortmerna_2.fq*

*Approx 20% complete for S3-sortmerna_2.fq*

*Approx 25% complete for S3-sortmerna_2.fq*

*Approx 30% complete for S3-sortmerna_2.fq*

*Failed to process file S3-sortmerna_2.fq*

*uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'*
*at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)*
*at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)*

software error RNA-Seq alignment • 3.0k views
0
Entering edit mode

Have you checked to ensure that the original files themselves were not corrupt before you did the sortme-RNA?

You can use repair.sh from BBMap Suite to re-pair the files (check this link: C: Calculating number of reads for paired end reads? )

0
Entering edit mode

Can you explain how your tool will repair the corrupted fastq files? The original files were not corrupted. Some thing went wrong when I do sortmerna and unmerge-paired-reads.sh to get the paired files, they have different number of lines(210133674 S3-sortmerna_1.fq 210133672 S3-sortmerna_2.fq)

0
Entering edit mode

repair.sh compares records in two files and should keep those that have a match in both and remove any singletons to separate files. That said, if your file has corrupt fastq records (i.e. they don't have 4 lines per record and that may be the case here) then repair.sh may not work. You may get an error or it may remove more than 2 reads.

If you are sure the original files are fine then perhaps try re-running sortmeRNA again.

4
Entering edit mode
3.0 years ago
matt.shenton ▴ 40

I found a similar issue using sortmerna-2.1b

Out of 24 fastq files, 2 had a problem.

I checked them using https://github.com/statgen/fastQValidator

I found that at the lines where fastQvalidator found a problem, sortmerna had introduced a blank line; thus the fastq header was flagged as too short, and the subsequent lines were in the wrong place - the header was where the sequence should be.

I simply edited with vi and removed the blank lines, and now they pass validation with fastQvalidator.

This is not a random error, because I repeated sortmerna with the same files (which had no problem after conversion to interleaved format with sortmerna-2.1b/scripts/merge-paired-reads.sh) and both files had the same problem again.

I haven't checked, but maybe another program I've used upstream for read quality control has introduced some character that then causes sortmerna to exhibit this behaviour.

Hope this is helpful.

1
Entering edit mode

If I were you I would check more than blank lines. I am encountering the same problem right now and I noticed that this error completely messes up the 4th lines containing the quality information. This error causes to misplace them to different reads. At least in my case.

EDIT: I've just realized that the misplacement is caused by deinterleaving merged reads without deleting the blank line after the process. Sorry for misinformation.

0
Entering edit mode

Oh ok. I wasn't aware of that! Luckily it doesn't matter for me, since I trim the reads beforehand and mapping doesn't consider base calling quality. But nevertheless I will have to check all my data, to be sure

0
Entering edit mode

Thank you so much! I had the same problem and found the blank line exactly were STAR couldn't proceed. Interestingly, on the other hand, salmon had no problems with that blank line whatsoever

Thanks again for pointing out what to look for. I will have to implement a control for this error in my workflow!

Edit: Spelling