Question

Convert Illumina 1.3 To Illumina 1.5

0

Entering edit mode

11.6 years ago

Giovanni M Dall'Olio 28k

I have some fastq files in the Illumina 1.3 format. What is the best way to convert it to Illumina 1.5?

Quoting wikipedia, the differences between these two versions is that 1.5 doesn't make use of the symbols "@" and "A" for the phred scores, and that the "B" character has a different meaning:

  ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................
  .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................
  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
  |                         |    |        |                              |                     |
                           -5....0........9.............................40 
                                 0........9.............................40 

 I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)
 J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40)
     with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator

(scheme taken from http://en.wikipedia.org/wiki/FASTQ_format )

Is it safe to simply convert all the "@" and "A" characters to "B"? Or maybe I should convert them all to C?

Should I use:

sed '4~4/[A@]/B/'

or:

sed '4~4/[A@B]/C/'

?

Thanks in advance.

illumina fastq format conversion • 3.7k views

ADD COMMENT • link updated 11.6 years ago by Istvan Albert 102k • written 11.6 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

see http://seqanswers.com/forums/showthread.php?t=5210 ?

ADD REPLY • link 11.6 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thank you Pierre, but that discussion is about converting Illumina 1.5 to Sanger. Maybe I could do a two-steps conversion 1.5>Sanger>1.3, but if it is possible I would prefer to do it with a single sed script.

ADD REPLY • link 11.6 years ago by Giovanni M Dall'Olio 28k

score 1 · Answer 1 · 2013-12-02

1

Entering edit mode

11.6 years ago

Istvan Albert 102k

The crux of the problem is what to do with a base that has a quality that is not represented in 1.5 -

that is data with the lowest quality (garbage most likely), since it a useless measurement anyway, so just convert it to the lowest value in 1.5 and that's that - it is wrong anyway

ADD COMMENT • link 11.6 years ago by Istvan Albert 102k