Question: How to repair corrupted fastq files after sortmeRNA
0
gravatar for SMILE
18 months ago by
SMILE100
SMILE100 wrote:

Hi all, After removing rRNA in the fastq files with sortmeRNA, one of the paied reads was corrupetd, which failed to do fastqc with error:

uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
    at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
    at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
    at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
    at java.lang.Thread.run(Thread.java:748)

I checked the lines of the two paired reads after sortmeRNA, and found one of the paired reads had two more lines than the other.

`wc -l S3-sortmerna_1.fq S3-sortmerna_2.fq`

**210133674 S3-sortmerna_1.fq**

**210133672 S3-sortmerna_2.fq**

Can someone explain the reason why this happed and give me some advice how to repair the fastq file?

Below are the command lines I used to do the sortmeRNA and fastqc

sortmerna --ref $REF --reads ./S3-interleaved.fq --sam --num_alignments 1 --fastx --align ed ./S3_rRNA --other ./S3_non_rRNA --log -v --paired_in

unmerge-paired-reads.sh ./S3_non_rRNA.fq ./S3-sortmerna_1.fq ./S3-sortmerna_2.fq

fastqc /S3-sortmerna_1.fq ./S3-sortmerna_2.fq

*Started analysis of S3-sortmerna_1.fq*

*Approx 5% complete for S3-sortmerna_1.fq*

.

.

.

*Approx 95% complete for S3-sortmerna_1.fq*

*Analysis complete for S3-sortmerna_1.fq*


*Started analysis of S3-sortmerna_2.fq*

*Approx 5% complete for S3-sortmerna_2.fq*

*Approx 10% complete for S3-sortmerna_2.fq*

*Approx 15% complete for S3-sortmerna_2.fq*

*Approx 20% complete for S3-sortmerna_2.fq*

*Approx 25% complete for S3-sortmerna_2.fq*

*Approx 30% complete for S3-sortmerna_2.fq*

*Failed to process file S3-sortmerna_2.fq*

*uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'*
    *at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)*
    *at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)*
    *at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)*
    *at java.lang.Thread.run(Thread.java:748)*
ADD COMMENTlink modified 16 months ago by matt.shenton40 • written 18 months ago by SMILE100

Have you checked to ensure that the original files themselves were not corrupt before you did the sortme-RNA?

You can use repair.sh from BBMap Suite to re-pair the files (check this link: C: Calculating number of reads for paired end reads? )

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax71k

Can you explain how your tool will repair the corrupted fastq files? The original files were not corrupted. Some thing went wrong when I do sortmerna and unmerge-paired-reads.sh to get the paired files, they have different number of lines(210133674 S3-sortmerna_1.fq 210133672 S3-sortmerna_2.fq)

ADD REPLYlink written 18 months ago by SMILE100

repair.sh compares records in two files and should keep those that have a match in both and remove any singletons to separate files. That said, if your file has corrupt fastq records (i.e. they don't have 4 lines per record and that may be the case here) then repair.sh may not work. You may get an error or it may remove more than 2 reads.

If you are sure the original files are fine then perhaps try re-running sortmeRNA again.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax71k
4
gravatar for matt.shenton
16 months ago by
matt.shenton40
matt.shenton40 wrote:

I found a similar issue using sortmerna-2.1b

Out of 24 fastq files, 2 had a problem.

I checked them using https://github.com/statgen/fastQValidator

I found that at the lines where fastQvalidator found a problem, sortmerna had introduced a blank line; thus the fastq header was flagged as too short, and the subsequent lines were in the wrong place - the header was where the sequence should be.

I simply edited with vi and removed the blank lines, and now they pass validation with fastQvalidator.

This is not a random error, because I repeated sortmerna with the same files (which had no problem after conversion to interleaved format with sortmerna-2.1b/scripts/merge-paired-reads.sh) and both files had the same problem again.

I haven't checked, but maybe another program I've used upstream for read quality control has introduced some character that then causes sortmerna to exhibit this behaviour.

Hope this is helpful.

ADD COMMENTlink written 16 months ago by matt.shenton40
1

If I were you I would check more than blank lines. I am encountering the same problem right now and I noticed that this error completely messes up the 4th lines containing the quality information. This error causes to misplace them to different reads. At least in my case.

EDIT: I've just realized that the misplacement is caused by deinterleaving merged reads without deleting the blank line after the process. Sorry for misinformation.

ADD REPLYlink modified 14 months ago • written 14 months ago by Marek Glombik10

Oh ok. I wasn't aware of that! Luckily it doesn't matter for me, since I trim the reads beforehand and mapping doesn't consider base calling quality. But nevertheless I will have to check all my data, to be sure

ADD REPLYlink written 14 months ago by caggtaagtat770

Thank you so much! I had the same problem and found the blank line exactly were STAR couldn't proceed. Interestingly, on the other hand, salmon had no problems with that blank line whatsoever

Thanks again for pointing out what to look for. I will have to implement a control for this error in my workflow!

Edit: Spelling

ADD REPLYlink modified 16 months ago • written 16 months ago by caggtaagtat770
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 808 users visited in the last hour