Question: Convert Illumina Reads To Sanger Score
4
gravatar for Khader Shameer
6.6 years ago by
Manhattan, NY
Khader Shameer17k wrote:

What tool do you use to convert your Illumina paired-end reads (Illumina's fastq is encoded in ASCII-64) to Sanger score (ASCII-33) ?

I am looking at two methods included in maq (both written by lh3): Do you use one of this methods associated with maq or recommend any other tools.

ADD COMMENTlink modified 4.3 years ago by Biostar ♦♦ 20 • written 6.6 years ago by Khader Shameer17k
3

fq_all2std.pl is outdated...

ADD REPLYlink written 6.6 years ago by lh330k

What's does it mean? The script is not suitable ? 

ADD REPLYlink written 2.0 years ago by Shicheng Guo4.5k
3

I think converting to the sanger scale should be the first step.

ADD REPLYlink written 6.6 years ago by lh330k

@lh3: Thanks for the info. I am wondering whether I should do the data-cleanup before / after converting the illumina to sanger score. Please let me know your thoughts.

ADD REPLYlink written 6.6 years ago by Khader Shameer17k

Thanks @lh3 !!

ADD REPLYlink written 6.6 years ago by Khader Shameer17k

I was wondering if you run a bunch of mixed formatted (some Illumina and some Sanger) using maq or seqret to create them all in a fixed ASCII-33 format, will these two tools skip the files that are already in Sanger (ASCII-33) format and convert only ASCII-64 files to ASCII-33?

ADD REPLYlink written 2.6 years ago by bioinfo650
9
gravatar for Louis Letourneau
6.5 years ago by
Montreal
Louis Letourneau790 wrote:

We use emboss seqret:


$EMBOSS_HOME/seqret fastq-illumina::phred64Data.fastq fastq::phred33Data.fastq
ADD COMMENTlink written 6.5 years ago by Louis Letourneau790

Is there documentation for which Fastq dialects seqret support and how to reference those dialects with a seqret command? I'm not seeing it in the EMBOSS documentation.

ADD REPLYlink written 4.7 years ago by Daniel Standage3.7k

Just found it: http://emboss.sourceforge.net/docs/themes/SequenceFormats.html

ADD REPLYlink written 4.7 years ago by Daniel Standage3.7k
3
gravatar for Farhat
6.6 years ago by
Farhat2.8k
Pune, India
Farhat2.8k wrote:

Galaxy's FASTQ groomer will do this job if you don't mind the web interface.

ADD COMMENTlink written 6.6 years ago by Farhat2.8k

Thanks Farhat. I am looking at a non-Galaxy solution at the moment.

ADD REPLYlink written 6.6 years ago by Khader Shameer17k
2
gravatar for brentp
6.6 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

I haven't used this particular tool, but here is a tool built with Jim Kent's libraries to do the conversion 64to33 (or 33to64).

It comes with a makefile and all the includes necessary so it should be quite fast.

EDIT: there's also a very nice C-API in the Kent-tools: https://github.com/jstjohn/KentLib/blob/master/lib/fastq.c The function signature looks like:

inline void phred64ToPhred33( char * p64, int l)

So it should be easy to use.

ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by brentp22k

Thanks Brent, I noticed that you mentioned about a potential issue with paired end read that not taken up by fastx_toolkit (http://biostar.stackexchange.com/questions/1675/filtering-paired-end-reads) Does this tool take care of that ?

ADD REPLYlink written 6.6 years ago by Khader Shameer17k

If you're just converting, not filtering that won't be an issue. If you filter after the conversion (for whatever reason), then yes, you'll probably have to figure out how to make sure you get neither or both reads.

ADD REPLYlink written 6.6 years ago by brentp22k

Thanks. I am posted another question on filtering (http://biostar.stackexchange.com/questions/8121/whole-exome-data-cleanup-filtering) I am not sure if it is good to do the filtering of the reads before / after the QC makes much difference.

ADD REPLYlink written 6.6 years ago by Khader Shameer17k

I'm not so familiar with C. If I want to use this script to convert fastq illumina quality score to fastq sanger quality score, what command should I run?

ADD REPLYlink written 5.4 years ago by Chai_AF80
2
gravatar for Weronika
6.4 years ago by
Weronika290
Stanford
Weronika290 wrote:

You an use the HTSeq package in python: http://www-huber.embl.de/users/anders/HTSeq/doc/sequences.html#sequences It will read fastq files with any of the common quality encodings, but always write using the Sanger (Phred) encoding.

ADD COMMENTlink written 6.4 years ago by Weronika290
1
gravatar for Bioquant
6.5 years ago by
Bioquant160
Bioquant160 wrote:

A dirty and quick solution would be to make a FASTA file with Ns or any other rarely occurring homopolymer sequence of length equal to your read length. Align you FATSQ file against this reference with any quality aware aligner like BWA or Bowtie to get the BAM file (you can parallelize it for speed). Now by definition in BAM file quality scores are recored as Sanger scores. All the reads will be reported only once as unaligned reads. Now you can use Picard to get the Fastq back from the BAM file with Sanger scores!

Note: there is a difference in the way quality scores are recored in Ilummina Fastq files pre and post 1.3 version of Casava.

http://en.wikipedia.org/wiki/FASTQ_format

ADD COMMENTlink written 6.5 years ago by Bioquant160
1
gravatar for Obi Griffith
5.4 years ago by
Obi Griffith16k
Washington University, St Louis, USA
Obi Griffith16k wrote:

For anyone dealing with the problem of various fastq encoding schemes and looking to do some sanity checks on their method of conversion. Or, if you just want to look up the phred score for specific ascii code under one of the different schemes. I have found this blog entry extremely useful. It provides Sanger (And Illumina 1.3+ (And Solexa)) Phred Score (Q) ASCII Glyph Base Error Conversion Tables.

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Obi Griffith16k

damn, the link you posted has been vandalized!

ADD REPLYlink written 4.0 years ago by Giovanni M Dall'Olio25k

Is nothing sacred? I emailed the author to let him know.

ADD REPLYlink written 4.0 years ago by Obi Griffith16k
1

Fixed. Site is back up.

ADD REPLYlink written 4.0 years ago by Obi Griffith16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1396 users visited in the last hour