Question: fastx toolkit problem: fastq to fasta
0
gravatar for biolab
4.3 years ago by
biolab1.1k
biolab1.1k wrote:

HI everyone

i convert fastq to fasta using fastx tooklit using the following command: fastq_to_fasta -i in.fq -o out.fa

However, an error message pop up:

fastq_to_fasta: Invalid quality score value (char '+' ord 43 quality value -21) on line 12

 

Following is the first 16 lines of in.fq,  what's wrong with line 12?  Thank you very much!

@ctl.2 HWI-D00169:39:D1Y16ACXX:7:1101:1639:2164 length=100
AATAGTGGAGTGTATTTCACGTCATTTATCATTATCATTTAGTTCAGTTTTAATTTTATTTAGTTTTGTACAATTTCAATCAAAAACAGGAGTTCAGGGA
+ctl.2 HWI-D00169:39:D1Y16ACXX:7:1101:1639:2164 length=100
@?@DDDDFHHFD<FFHFEIHGIIGEHIEIIAHHCFHBGHH9DGG@CDDFGICBBFCGIGHGGIGIIIIHEFIGEGFHGGFHIEHICEEHHEEBBCECEED
@ctl.3 HWI-D00169:39:D1Y16ACXX:7:1101:1787:2165 length=100
GTTATCCGGAATGATTGGGCGTAAAGCGTCTGTAGGTGGCTTTTTAAGTCCGCCGTCAATTCCCAGGGCTCAACCCTGGACAGGCGGTGGAAACTACCAA
+ctl.3 HWI-D00169:39:D1Y16ACXX:7:1101:1787:2165 length=100
BBBFFFFFHHFHHJJJJJJJJJJJJJJJIIJJIJJJFGGIIIIJIJJJJIJIJJHFFFDEEEEDDDDDDDDCDDD@BCBDBDDDDDD9@>BDCDDDDDDD
@ctl.5 HWI-D00169:39:D1Y16ACXX:7:1101:1853:2214 length=100
GAACCCATGAGGCACGCTGCGTGAGCCGCACCGCGCTGCTACTGGCGTTGGAGGAAGAGCTCCCAAGAGGCACCATCCGCTACTCCTCCAAGATCGTCTC
+ctl.5 HWI-D00169:39:D1Y16ACXX:7:1101:1853:2214 length=100
@@<DDDDDFHF?+<AE@GHGG@EGHBCF<D@77-;45@4?EAHEB;99?@?C;?BBA5<(5>@?9?A??B??AB<@?A@B>@BC>9@C??C@<AC?<A<<
@ctl.6 HWI-D00169:39:D1Y16ACXX:7:1101:1773:2218 length=100
AGGGGAGCCGGCGACCGAAGCCCCGGTGAACGGCGGCCGTAACAATAACGGTCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCCGCACGAAA
+ctl.6 HWI-D00169:39:D1Y16ACXX:7:1101:1773:2218 length=100
@@@DDDD:F@F:FGII)0-;FF@5AB'5?B;?<6;5B-707B@BB8333802?5>@B>BBBB<5;5>?B:44@4@49@B#####################

 

fastx • 4.7k views
ADD COMMENTlink modified 4.3 years ago by Brian Bushnell16k • written 4.3 years ago by biolab1.1k
2
gravatar for Ashutosh Pandey
4.3 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Add -Q33 on command line.  Fastx toolkit is assuming your fastq file in Phred+64 format, whereas your file has an offset of 33. So when it is subtracting 64 from 43 which is a corresponding decimal value for "+" ASCII character it is getting a negative value of -21 and therefore throwing a error. Read about different encodings here : http://en.wikipedia.org/wiki/FASTQ_format 

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Ashutosh Pandey11k
1
gravatar for geek_y
4.3 years ago by
geek_y9.3k
Barcelona/CRG/London/Imperial
geek_y9.3k wrote:
Correct Usage: fastq_to_fasta -Q33 -i in.fq -o out.fa

simple linux commands would do that:

cat in.fq | awk '{ if (NR%4==1) print ">"$0 ; if (NR%4==2) print }' > out.fa
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by geek_y9.3k

Thank you Ashutosh and Geek_y,  your comments are really helpful. However, I have one further question: I tried two approaches to convert fastq to fasta, one is fastx -Q33 -i in.fq -o out.fa, and the other is sed -n '1~4s/^@/>/p; 2~4p' in.fq > out.fa   I found the number of lines for these two out.fa files differs.  Where is the problem?  THANKS!

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by biolab1.1k
0
gravatar for Brian Bushnell
4.3 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

I suggest you try my reformat tool, which is (as far as I know) the fastest converter, at over 500MB/s.  It can handle various conversions (fastq, fasta+qual, fasta, sam, scarf, gzip, interleaved, dual-file, etc); it autodetects quality encoding, and can change between quality formats.

ADD COMMENTlink written 4.3 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 943 users visited in the last hour