I have some fastq files in the Illumina 1.3 format. What is the best way to convert it to Illumina 1.5?
Quoting wikipedia, the differences between these two versions is that 1.5 doesn't make use of the symbols "@" and "A" for the phred scores, and that the "B" character has a different meaning:
...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
.................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
| | | | | |
-5....0........9.............................40
0........9.............................40
I - Illumina 1.3+ Phred+64, raw reads typically (0, 40)
J - Illumina 1.5+ Phred+64, raw reads typically (3, 40)
with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator
(scheme taken from http://en.wikipedia.org/wiki/FASTQ_format )
Is it safe to simply convert all the "@" and "A" characters to "B"? Or maybe I should convert them all to C?
Should I use:
sed '4~4/[A@]/B/'
or:
sed '4~4/[A@B]/C/'
?
Thanks in advance.
see http://seqanswers.com/forums/showthread.php?t=5210 ?
Thank you Pierre, but that discussion is about converting Illumina 1.5 to Sanger. Maybe I could do a two-steps conversion 1.5>Sanger>1.3, but if it is possible I would prefer to do it with a single sed script.