Convert Illumina 1.3 To Illumina 1.5
1
0
Entering edit mode
8.0 years ago

I have some fastq files in the Illumina 1.3 format. What is the best way to convert it to Illumina 1.5?

Quoting wikipedia, the differences between these two versions is that 1.5 doesn't make use of the symbols "@" and "A" for the phred scores, and that the "B" character has a different meaning:

  ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
.................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
!"#\$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz{|}~
|                         |    |        |                              |                     |
-5....0........9.............................40
0........9.............................40

I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40)
with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator


(scheme taken from http://en.wikipedia.org/wiki/FASTQ_format )

Is it safe to simply convert all the "@" and "A" characters to "B"? Or maybe I should convert them all to C?

Should I use:

sed '4~4/[A@]/B/'


or:

sed '4~4/[A@B]/C/'
`

?

illumina fastq format conversion • 2.6k views
0
Entering edit mode
0
Entering edit mode

Thank you Pierre, but that discussion is about converting Illumina 1.5 to Sanger. Maybe I could do a two-steps conversion 1.5>Sanger>1.3, but if it is possible I would prefer to do it with a single sed script.

1
Entering edit mode
8.0 years ago

The crux of the problem is what to do with a base that has a quality that is not represented in 1.5 -

that is data with the lowest quality (garbage most likely), since it a useless measurement anyway, so just convert it to the lowest value in 1.5 and that's that - it is wrong anyway