Fastq size: Why don't we have nucleotide information merged to quality scores?
1
2
Entering edit mode
8.9 years ago
joedever42 ▴ 20

Hi,

The fastq format right now has: header, sequence string, "phantom" header and quality string. For storage purposes, why doesn't the fastq format incorporate nucleotide information in the quality strings? Is it just to make it more human readable?

fastq • 1.6k views
ADD COMMENT
0
Entering edit mode

Can you give us an example of how this would be implemented?

Also, FASTQ is traditionally a merge of FASTA and QUAL files, and the format you see is owing to the dated implementation. I'm sure people are working on optimizations, but there's also the "why fix what isn't broken" question to be satisfactorily answered.

FASTQ is definitely not meant to be human readable.

ADD REPLY
0
Entering edit mode

Are you proposing to have five sets of quality scores--one each for A, C, G, T, and N? Then each set of quality scores would have a unique set of characters within denoting a certain Phred value?

ADD REPLY
8
Entering edit mode
8.9 years ago
lh3 33k

FASTQ is almost never stored uncompressed. With compression, merging bases and quality may actually hurt compression as compression is usually less efficient when you mix different types of information together. FASTQ is not meant to be read by a human from the start to end, but it is meant to be eye-read in a small portion and manipulated with the many unix tools. These are critical features.

ADD COMMENT
0
Entering edit mode

Yeah I think the key here is that fastqs are usually gzipped or something anyways.

ADD REPLY

Login before adding your answer.

Traffic: 1861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6