Question: fastx toolkit problem: fastq to fasta
0
gravatar for biolab
4.8 years ago by
biolab1.1k
biolab1.1k wrote:

HI everyone

i convert fastq to fasta using fastx tooklit using the following command: fastq_to_fasta -i in.fq -o out.fa

However, an error message pop up:

fastq_to_fasta: Invalid quality score value (char '+' ord 43 quality value -21) on line 12

 

Following is the first 16 lines of in.fq,  what's wrong with line 12?  Thank you very much!

@ctl.2 HWI-D00169:39:D1Y16ACXX:7:1101:1639:2164 length=100
AATAGTGGAGTGTATTTCACGTCATTTATCATTATCATTTAGTTCAGTTTTAATTTTATTTAGTTTTGTACAATTTCAATCAAAAACAGGAGTTCAGGGA
+ctl.2 HWI-D00169:39:D1Y16ACXX:7:1101:1639:2164 length=100
@?@DDDDFHHFD<FFHFEIHGIIGEHIEIIAHHCFHBGHH9DGG@CDDFGICBBFCGIGHGGIGIIIIHEFIGEGFHGGFHIEHICEEHHEEBBCECEED
@ctl.3 HWI-D00169:39:D1Y16ACXX:7:1101:1787:2165 length=100
GTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTCCGCCGTCAATTCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTACCAA
+ctl.3 HWI-D00169:39:D1Y16ACXX:7:1101:1787:2165 length=100
BBBFFFFFHHFHHJJJJJJJJJJJJJJJIIJJIJJJFGGIIIIJIJJJJIJIJJHFFFDEEEEDDDDDDDDCDDD@BCBDBDDDDDD9@>BDCDDDDDDD
@ctl.5 HWI-D00169:39:D1Y16ACXX:7:1101:1853:2214 length=100
GAACCCATGAGGCACGCTGCGTGAGCCGCACCGCGCTGCTACTGGCGTTGGAGGAAGAGCTCCCAAGAGGCACCATCCGCTACTCCTCCAAGATCGTCTC
+ctl.5 HWI-D00169:39:D1Y16ACXX:7:1101:1853:2214 length=100
@@<DDDDDFHF?+<AE@GHGG@EGHBCF<D@77-;45@4?EAHEB;99?@?C;?BBA5<(5>@?9?A??B??AB<@?A@B>@BC>9@C??C@<AC?<A<<
@ctl.6 HWI-D00169:39:D1Y16ACXX:7:1101:1773:2218 length=100
AGGGGAGCCGGCGACCGAAGCCCCGGTGAACGGCGGCCGTAACAATAACGGTCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCCGCACGAAA
+ctl.6 HWI-D00169:39:D1Y16ACXX:7:1101:1773:2218 length=100
@@@DDDD:F@F:FGII)0-;FF@5AB'5?B;?<6;5B-707B@BB8333802?5>@B>BBBB<5;5>?B:44@4@49@B#####################

 

fastx • 5.2k views
ADD COMMENTlink modified 5 months ago by sara.dandreano0 • written 4.8 years ago by biolab1.1k

Hi all, I'm trying to use the command "fastq_to_fasta" on fastq files from MinION run (Nanopore technologies) because I need a fasta.file to go on with data analysis. When I run the command "fastq_to_fasta -i fileNanopore.fastq -o .fileNanopore.fasta" I got this error: fastq_to_fasta: Error: invalid quality score data on line 2060 (quality_tok = "+"

I don't understand this error, could someone help me? Thank you in advance, Best regards Sara

ADD REPLYlink written 5 months ago by sara.dandreano0
2

Did you try adding -Q33 to your command line? Your nanopore data is in Sanger fastq format.

That said you should use reformat.sh or one of the newer tools for this.

ADD REPLYlink modified 5 months ago • written 5 months ago by genomax71k

Dear genomax, thank you for your reply, I already tried the -Q33 as it was written in another question I saw on Biostars but it was not working. I will try the Phred+33 as h.mon suggests. Thank you very much!!

ADD REPLYlink written 5 months ago by sara.dandreano0
1

The FASTX-Toolkit is very old, and was developed back when Illumina used what is called Phred+64 quality encoding. Later, Illumina moved to the original Sanger Phred+33 encoding, and nowadays I believe every sequencing platform uses Phred+33. Hence fastx_to_fasta had Phred+64 as default, and you have to use the -Q 33 argument in case your file uses the Phred+33 encoding, as genomax pointed out. Read the fastq WikiPedia page for more information.

Be aware that the FASTX-Toolkit is really old and was designed with short reads in mind, it may or may not work for long NanoPore reads - be sure to double-check the integrity of the reads after the conversion.

ADD REPLYlink written 5 months ago by h.mon27k

Dear h.mon, Thank you for the suggestion, I will try with the Phred+33! best regards Sara

ADD REPLYlink written 5 months ago by sara.dandreano0
1

Phread+33 is represented by -Q 33 option. Please follow our suggestions and use reformat.sh from BBMap suite.

reformat.sh in=your.fastq out=new.fa
ADD REPLYlink written 5 months ago by genomax71k
2
gravatar for Ashutosh Pandey
4.8 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Add -Q33 on command line.  Fastx toolkit is assuming your fastq file in Phred+64 format, whereas your file has an offset of 33. So when it is subtracting 64 from 43 which is a corresponding decimal value for "+" ASCII character it is getting a negative value of -21 and therefore throwing a error. Read about different encodings here : http://en.wikipedia.org/wiki/FASTQ_format 

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by Ashutosh Pandey11k
1
gravatar for geek_y
4.8 years ago by
geek_y9.8k
Barcelona
geek_y9.8k wrote:
Correct Usage: fastq_to_fasta -Q33 -i in.fq -o out.fa

simple linux commands would do that:

cat in.fq | awk '{ if (NR%4==1) print ">"$0 ; if (NR%4==2) print }' > out.fa
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by geek_y9.8k

Thank you Ashutosh and Geek_y,  your comments are really helpful. However, I have one further question: I tried two approaches to convert fastq to fasta, one is fastx -Q33 -i in.fq -o out.fa, and the other is sed -n '1~4s/^@/>/p; 2~4p' in.fq > out.fa   I found the number of lines for these two out.fa files differs.  Where is the problem?  THANKS!

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by biolab1.1k
0
gravatar for Brian Bushnell
4.8 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

I suggest you try my reformat tool, which is (as far as I know) the fastest converter, at over 500MB/s.  It can handle various conversions (fastq, fasta+qual, fasta, sam, scarf, gzip, interleaved, dual-file, etc); it autodetects quality encoding, and can change between quality formats.

ADD COMMENTlink written 4.8 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2080 users visited in the last hour