Question: fastqc Exception in thread "Thread-1" (error)
0
gravatar for Assa Yeroslaviz
4.0 years ago by
Assa Yeroslaviz1.2k
Munich
Assa Yeroslaviz1.2k wrote:

Hi,

I am running the fastqc (v0.11.4) on Ubuntu 14.04.3 LTS.
I have four fastq files (two pairs of paired-end reads samples). They are AFAIK from old solexa machines in sanger format
somehow when I am trying to fastq the _1 files I get the following error message:

fastqc -t 12 -o ../Results/1c3c603f-29ac-4263-851d-b19f9ce4cfb0/fastqcResults/ 61627AAXX_1_1.fastq.gz
Started analysis of 61627AAXX_1_1.fastq.gz
Exception in thread "Thread-1" java.lang.IllegalArgumentException: Unexpected cs char C
        at uk.ac.babraham.FastQC.Sequence.FastQFile.convertColorspaceToBases(FastQFile.java:334)
        at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:191)
        at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
        at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
        at java.lang.Thread.run(Thread.java:745)

than nothing happens.
When I am using the same command for the _2 files, it works fine.

fastqc -t 12 -o ../Results/1c3c603f-29ac-4263-851d-b19f9ce4cfb0/fastqcResults/ 61627AAXX_1_2.fastq.gz
Started analysis of 61627AAXX_1_2.fastq.gz
Approx 5% complete for 61627AAXX_1_2.fastq.gz
...


I can't see any differences in the format of the two files.

the header of the two files from one of the pairs looks like that:

zcat 61GAFAAXX_1_1.fastq.gz | head -n 12
@SOLEXA12_1:1:1:990:4777/1 1:Y:0:0
..................................................
+
##################################################
@SOLEXA12_1:1:1:990:11674/1 1:Y:0:0
..................................................
+
##################################################
@SOLEXA12_1:1:1:990:17662/1 1:Y:0:0
..................................................
+
##################################################

and

zcat 61GAFAAXX_1_2.fastq.gz | head -n 12
@SOLEXA12_1:1:1:990:4777/2 2:Y:0:0
..................................................
+
##################################################
@SOLEXA12_1:1:1:990:11674/2 2:Y:0:0
..................................................
+
##################################################
@SOLEXA12_1:1:1:990:17662/2 2:Y:0:0
..................................................
+
##################################################

Any ideas, why I can't run the _1 files?

thanks
Assa

P.S.
When I am running the SolexaQA++ tool, I can read all the four files without difficulties.

fastqc fastq • 3.6k views
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Assa Yeroslaviz1.2k

Are the lines that are just "..................." actually there or did you just censor the sequence? If those are actually there then the fastq files aren't valid and I'm not surprised that fastqc is complaining.

ADD REPLYlink modified 2 days ago by RamRS25k • written 4.0 years ago by Devon Ryan93k

yes, they are really there and no, fastqc complains only in the two files with the _1 part of the paired-end files, the _2 partners are running without a problem. For that reason I don't think that it is the "." in the sequence.

I have taken multiple subsets of the data and pinpointed the region to the sequence in rows 841-844. If I take only the first 840 rows, I can run fastqc, but if I add the next four lines, it gives me the error message.

Unfortunately I can't see any differences in these four rows to the rest of the data.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Assa Yeroslaviz1.2k

It's assuming that you have colorspace data since that's the only place "." is valid. However, it looks like you instead have non colorspace data (i.e., you probably have normal data), which is what's confusing it and causing the errors. Where did these files come from (i.e., what type of machine and when)?

ADD REPLYlink written 4.0 years ago by Devon Ryan93k

These are Illumina (Solexa) reads from cancer patients 50bp long reads paired-end sequencing.

But this doesn't explain, why the _2 files can be read with fastqc and the _1 file can't.

ADD REPLYlink modified 2 days ago by RamRS25k • written 4.0 years ago by Assa Yeroslaviz1.2k

Are the fastq-fragments you posted above from those suspicious lines? With the given fastq lines, fastqc does not complain in my case.

Edit: Sorry, I have overlooked the 'head' command. May you paste the lines 836 to e.g., 852?

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by dschika290

these are rows 837-852:

@SOLEXA9_1:1:1:1072:8268/1 1:Y:0:0
A.................................................
+
##################################################
@SOLEXA9_1:1:1:1072:10325/1 1:Y:0:0
GC................................................
+
##################################################
@SOLEXA9_1:1:1:1072:14294/1 1:Y:0:0
CT................................................
+
##################################################
@SOLEXA9_1:1:1:1073:8096/1 1:Y:0:0
TG................................................
+
##################################################


But I can't see any differences.

I have uploaded the first 852 rows of my file to here. Maybe someone can test it and see if it runs on their machine.

thanks

ADD REPLYlink written 4.0 years ago by Assa Yeroslaviz1.2k

The "." is valid in a csfasta file but not a normal fastq file. Likewise, a sequence starting with GC is invalid in a csfastq file (you can have one and only one base at the beginning AFAIK...and it's usually T from what I remember). My only guess is that this was originally color space data and someone tried to convert it to base space at some point.

Edit: This also explains why you got the "unexpected cs char C" message, since "cs" means "color space".

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Devon Ryan93k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 746 users visited in the last hour