Question: Guessing The Quality Scale In Fastq Files
9
gravatar for Manuel
9.5 years ago by
Manuel390
Germany
Manuel390 wrote:

Is there an easy way to guess the scale, given a sufficiently large FASTQ file?

The best would be some working code that I could learn from. However, both BioPerl and BioPython appear not to contain guessing code.

fastq quality • 9.6k views
ADD COMMENTlink modified 9.1 years ago by Sequencegeek740 • written 9.5 years ago by Manuel390
6
gravatar for brentp
9.5 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

You read the biopython code here? That's the best explanation of the quality scores I've seen.

There's also a nice text-graphic about 2/3rd's of the way down the wikipedia page

Finally, FastQC guesses the encoding of your quality scores, so you could look at the java code.

ADD COMMENTlink written 9.5 years ago by brentp23k

Thanks, BioPython does not hav guessing code, though, right? FastQC just looks at the lowest seen quality. I guess that's most promising, then, maybe augmented by checking an upper limit, too.

ADD REPLYlink written 9.5 years ago by Manuel390
3
gravatar for Mikael Huss
9.5 years ago by
Mikael Huss4.7k
Stockholm
Mikael Huss4.7k wrote:

Here is a Perl script for guessing the quality scale

https://www.uppnex.uu.se/content/check-fastq-quality-score-format

ADD COMMENTlink written 9.5 years ago by Mikael Huss4.7k
1

Here is the new link for this Perl tool : http://www.uppmax.uu.se/userscript/check-fastq-quality-score-format

It has been improved recently.

-- update --

You can find it in this repository, under this name fastq_guessMyFormat.pl: https://github.com/NBISweden/GAAS/tree/master/annotation/Tools/Util

Here is a link to download it directly.

ADD REPLYlink modified 2.8 years ago • written 5.0 years ago by Juke344.6k

link is meanwhile broken also.

ADD REPLYlink written 2.8 years ago by Yahan390

Thanks,

Updated now

ADD REPLYlink written 2.8 years ago by Juke344.6k
2
gravatar for Pierre Lindenbaum
9.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:

Does the FAST-X toolkit answer your needs ? http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_boxplot_usage

ADD COMMENTlink modified 7.6 years ago by Istvan Albert ♦♦ 85k • written 9.5 years ago by Pierre Lindenbaum130k
1

Hm, I would like to do this programatically. I think something like the FastQC guesser looks more promising. Thanks, though.

ADD REPLYlink written 9.5 years ago by Manuel390
2
gravatar for Ryan Thompson
9.4 years ago by
Ryan Thompson3.4k
TSRI, La Jolla, CA
Ryan Thompson3.4k wrote:

I wrote a Python-based FASTQ quality guesser: https://github.com/DarwinAwardWinner/fastqident It uses BioPython's FASTQ parser, so it will work on anything that is parsable by BioPython.

ADD COMMENTlink written 9.4 years ago by Ryan Thompson3.4k

i am getting 404'd

ADD REPLYlink written 9.1 years ago by Jeremy Leipzig19k

Looks good, but it doesn't install correctly. The module "placsupport" cannot be found in PyPI.

ADD REPLYlink written 7.4 years ago by xapple230

The placsupport module can be found at https://github.com/DarwinAwardWinner/placsupport

ADD REPLYlink written 5.3 years ago by Keith Callenberg910
2
gravatar for Marvin
9.1 years ago by
Marvin850
Marvin850 wrote:

Isn't that solving the wrong problem? The guessing code in FastQC looks fragile, it simply looks at the smallest code used for qualities, so it depends on actually seeing low quality bases.

I believe you should get the correct encoding from extra knowledge (i.e. knowing which version of which program generated the file, say from some log file), and then convert to a well specified format (e.g. BAM) once. Please don't perpetuate the practive of guessing at the details underspecified formats.

ADD COMMENTlink written 9.1 years ago by Marvin850
0
gravatar for Sequencegeek
9.1 years ago by
Sequencegeek740
UCLA
Sequencegeek740 wrote:

In addition to Ryan, I have a python based fastq quality guesser as well if you would like to use it. It is just standard python (no biopython). PM if interested.

ADD COMMENTlink modified 9.1 years ago • written 9.1 years ago by Sequencegeek740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 878 users visited in the last hour