FastQ files - Is the content of the + line requiered or I can get rid of it?
8.4 years ago
BioApps ▴ 790

I am building a visual FastQ editor (An efficient FastQ viewer and editor (GUI)). I hope it will be the mother of all fastq graphic editors :). I started to implement some functions already and to write output to disk. So, I have a question: is the content of the + line necessary? If I don't save that content the resulted file will be 25% smaller, which is a HUGE difference!

@SR000066.212673 EQ length=115
ACGT
+SR000066.212673 EQ length=115           <-------- remove this content (keep +)
B:B:

I have seen an article saying that it can and should be removed. Since there is no official documentation for FastQ format, I think that MAYBE some tools out there may still use that field for god-knows what.

Yes, a good practice is to remove it, to reduce the file size and the memory required to open it.

My FastQ Editor always requires the same amount of RAM (below 25MB) no matter how big the files it is.

8.4 years ago
Assa Yeroslaviz ★ 1.7k

As far as I know, you can remove the content if you keep the structure. There are already some fastq files, were you only have the '+' symbol with no content.

BTW - is your tool still only for window machines?

Hi Frymor - The 'documentation' says indeed that the data in + field is indeed optional.

>Is your tool still only for window machines?

Yes. For the moment it only compiles on Windows. But don't worry; there will be a Mac and then Linux version also.

+1 I have never used the third line for anything, and Illumina sequencers leave it empty except for the '+'.

8.4 years ago

I can only think of one operation that would benefit from this commented strand entry, and that is generating FASTA + QUAL files from your FASTQ file (simply split every two lines and replace @ and + with >). Since none of us do that operation routinely I don't think you'll miss out on much.

I think I better add a checkbox to let the user decide, but let the checkbox unchecked by default.