Convert Illumina Reads To Sanger Score
6
4
Entering edit mode
11.4 years ago

What tool do you use to convert your Illumina paired-end reads (Illumina's fastq is encoded in ASCII-64) to Sanger score (ASCII-33) ?

I am looking at two methods included in maq (both written by lh3): Do you use one of this methods associated with maq or recommend any other tools.

illumina short next-gen sequencing • 14k views
3
Entering edit mode

fq_all2std.pl is outdated...

0
Entering edit mode

What does it mean? The script is not suitable?

3
Entering edit mode

I think converting to the sanger scale should be the first step.

0
Entering edit mode

@lh3: Thanks for the info. I am wondering whether I should do the data-cleanup before / after converting the illumina to sanger score. Please let me know your thoughts.

0
Entering edit mode

Thanks @lh3 !!

0
Entering edit mode

I was wondering if you run a bunch of mixed formatted (some Illumina and some Sanger) using maq or seqret to create them all in a fixed ASCII-33 format, will these two tools skip the files that are already in Sanger (ASCII-33) format and convert only ASCII-64 files to ASCII-33?

9
Entering edit mode
11.4 years ago

We use emboss seqret:


\$EMBOSS_HOME/seqret fastq-illumina::phred64Data.fastq fastq::phred33Data.fastq

0
Entering edit mode

Is there documentation for which Fastq dialects seqret support and how to reference those dialects with a seqret command? I'm not seeing it in the EMBOSS documentation.

0
Entering edit mode
3
Entering edit mode
11.4 years ago
Farhat ★ 2.9k

Galaxy's FASTQ groomer will do this job if you don't mind the web interface.

0
Entering edit mode

Thanks Farhat. I am looking at a non-Galaxy solution at the moment.

2
Entering edit mode
11.4 years ago
brentp 24k

I haven't used this particular tool, but here is a tool built with Jim Kent's libraries to do the conversion 64to33 (or 33to64).

It comes with a makefile and all the includes necessary so it should be quite fast.

EDIT: there's also a very nice C-API in the Kent-tools: https://github.com/jstjohn/KentLib/blob/master/lib/fastq.c The function signature looks like:

inline void phred64ToPhred33( char * p64, int l)

So it should be easy to use.

0
Entering edit mode

Thanks Brent, I noticed that you mentioned about a potential issue with paired end read that not taken up by fastx_toolkit ( Filtering Paired End Reads ) Does this tool take care of that?

0
Entering edit mode

If you're just converting, not filtering that won't be an issue. If you filter after the conversion (for whatever reason), then yes, you'll probably have to figure out how to make sure you get neither or both reads.

0
Entering edit mode

Thanks. I am posted another question on filtering I am not sure if it is good to do the filtering of the reads before / after the QC makes much difference.

0
Entering edit mode

I'm not so familiar with C. If I want to use this script to convert fastq illumina quality score to fastq sanger quality score, what command should I run?

2
Entering edit mode
11.3 years ago
Weronika ▴ 300

You an use the HTSeq package in python: http://www-huber.embl.de/users/anders/HTSeq/doc/sequences.html#sequences. It will read fastq files with any of the common quality encodings, but always write using the Sanger (Phred) encoding.

1
Entering edit mode
11.4 years ago
Bioquant ▴ 160

A dirty and quick solution would be to make a FASTA file with Ns or any other rarely occurring homopolymer sequence of length equal to your read length. Align you FATSQ file against this reference with any quality aware aligner like BWA or Bowtie to get the BAM file (you can parallelize it for speed). Now by definition in BAM file quality scores are recored as Sanger scores. All the reads will be reported only once as unaligned reads. Now you can use Picard to get the Fastq back from the BAM file with Sanger scores!

Note: there is a difference in the way quality scores are recored in Ilummina Fastq files pre and post 1.3 version of Casava.

http://en.wikipedia.org/wiki/FASTQ_format

1
Entering edit mode
10.2 years ago

For anyone dealing with the problem of various fastq encoding schemes and looking to do some sanity checks on their method of conversion. Or, if you just want to look up the phred score for specific ascii code under one of the different schemes. I have found this blog entry extremely useful. It provides Sanger (And Illumina 1.3+ (And Solexa)) Phred Score (Q) ASCII Glyph Base Error Conversion Tables.

0
Entering edit mode

damn, the link you posted has been vandalized!

0
Entering edit mode

Is nothing sacred? I emailed the author to let him know.

1
Entering edit mode

Fixed. Site is back up.