Am I right to use the following script to transform a fastq file (named test.fastq) to a fasta file? THANKS a lot!
In the list, seqtk, bioawk and seqret work with multi-line fastq; the rest don't. If you just want to use the standard unix tools, rtliu's sed solution is preferred, both short and efficient. It should be noted that file t.fq is put in /dev/shm and the results are written to /dev/null. In real applications, I/O may take more wall-clock time than CPU. In addition, frequently the sequence file is gzip'd. For seqtk, decompression takes more CPU time than parsing fastq.
SES observed that seqret was faster than Irsan's command. At my hand, seqret is always slower than most. Is it because of version, locale or something else?
I have not tried the native bioperl parser. Probably it is much slower. Using bioperl on large fastq is discouraged.
I do agree 4-line fastq is much more convenient for many people. However, fastq is never "defined" to consist of 4 lines. When it was first introduced at the Sanger Institute for ~1kb capillary reads, it allows multiple lines.
Tools in many ancient unix distributions (e.g. older AIX) do not work with long lines. I was working with a couple of such unix even in 2005. I guess this is why older outputs/formats, such as fasta, blast and genbank/embl, used short lines. This is not a concern any more nowadays.
To convert multi-line fastq to 4-line (or multi-line fasta to 2-line fasta): seqtk seq -l0 multi-line.fq > 4-line.fq