Question: fastx toolkit problem: fastq to fasta
gravatar for biolab
5.8 years ago by
biolab1.2k wrote:

HI everyone

i convert fastq to fasta using fastx tooklit using the following command: fastq_to_fasta -i in.fq -o out.fa

However, an error message pop up:

fastq_to_fasta: Invalid quality score value (char '+' ord 43 quality value -21) on line 12


Following is the first 16 lines of in.fq,  what's wrong with line 12?  Thank you very much!

@ctl.2 HWI-D00169:39:D1Y16ACXX:7:1101:1639:2164 length=100
+ctl.2 HWI-D00169:39:D1Y16ACXX:7:1101:1639:2164 length=100
@ctl.3 HWI-D00169:39:D1Y16ACXX:7:1101:1787:2165 length=100
+ctl.3 HWI-D00169:39:D1Y16ACXX:7:1101:1787:2165 length=100
@ctl.5 HWI-D00169:39:D1Y16ACXX:7:1101:1853:2214 length=100
+ctl.5 HWI-D00169:39:D1Y16ACXX:7:1101:1853:2214 length=100
@ctl.6 HWI-D00169:39:D1Y16ACXX:7:1101:1773:2218 length=100
+ctl.6 HWI-D00169:39:D1Y16ACXX:7:1101:1773:2218 length=100


fastx • 5.9k views
ADD COMMENTlink modified 18 months ago by sara.dandreano0 • written 5.8 years ago by biolab1.2k

Hi all, I'm trying to use the command "fastq_to_fasta" on fastq files from MinION run (Nanopore technologies) because I need a fasta.file to go on with data analysis. When I run the command "fastq_to_fasta -i fileNanopore.fastq -o .fileNanopore.fasta" I got this error: fastq_to_fasta: Error: invalid quality score data on line 2060 (quality_tok = "+"

I don't understand this error, could someone help me? Thank you in advance, Best regards Sara

ADD REPLYlink written 18 months ago by sara.dandreano0

Did you try adding -Q33 to your command line? Your nanopore data is in Sanger fastq format.

That said you should use or one of the newer tools for this.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax89k

Dear genomax, thank you for your reply, I already tried the -Q33 as it was written in another question I saw on Biostars but it was not working. I will try the Phred+33 as h.mon suggests. Thank you very much!!

ADD REPLYlink written 18 months ago by sara.dandreano0

The FASTX-Toolkit is very old, and was developed back when Illumina used what is called Phred+64 quality encoding. Later, Illumina moved to the original Sanger Phred+33 encoding, and nowadays I believe every sequencing platform uses Phred+33. Hence fastx_to_fasta had Phred+64 as default, and you have to use the -Q 33 argument in case your file uses the Phred+33 encoding, as genomax pointed out. Read the fastq WikiPedia page for more information.

Be aware that the FASTX-Toolkit is really old and was designed with short reads in mind, it may or may not work for long NanoPore reads - be sure to double-check the integrity of the reads after the conversion.

ADD REPLYlink written 18 months ago by h.mon31k

Dear h.mon, Thank you for the suggestion, I will try with the Phred+33! best regards Sara

ADD REPLYlink written 18 months ago by sara.dandreano0

Phread+33 is represented by -Q 33 option. Please follow our suggestions and use from BBMap suite. in=your.fastq out=new.fa
ADD REPLYlink written 18 months ago by genomax89k
gravatar for Ashutosh Pandey
5.8 years ago by
Ashutosh Pandey12k wrote:

Add -Q33 on command line.  Fastx toolkit is assuming your fastq file in Phred+64 format, whereas your file has an offset of 33. So when it is subtracting 64 from 43 which is a corresponding decimal value for "+" ASCII character it is getting a negative value of -21 and therefore throwing a error. Read about different encodings here : 

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Ashutosh Pandey12k
gravatar for geek_y
5.8 years ago by
geek_y11k wrote:
Correct Usage: fastq_to_fasta -Q33 -i in.fq -o out.fa

simple linux commands would do that:

cat in.fq | awk '{ if (NR%4==1) print ">"$0 ; if (NR%4==2) print }' > out.fa
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by geek_y11k

Thank you Ashutosh and Geek_y,  your comments are really helpful. However, I have one further question: I tried two approaches to convert fastq to fasta, one is fastx -Q33 -i in.fq -o out.fa, and the other is sed -n '1~4s/^@/>/p; 2~4p' in.fq > out.fa   I found the number of lines for these two out.fa files differs.  Where is the problem?  THANKS!

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by biolab1.2k
gravatar for Brian Bushnell
5.8 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

I suggest you try my reformat tool, which is (as far as I know) the fastest converter, at over 500MB/s.  It can handle various conversions (fastq, fasta+qual, fasta, sam, scarf, gzip, interleaved, dual-file, etc); it autodetects quality encoding, and can change between quality formats.

ADD COMMENTlink written 5.8 years ago by Brian Bushnell17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1475 users visited in the last hour