Each of my fastq files is about 20M reads, while I need to split the big fastq files into chunks of 1M reads.
Is there any available tool that can do such jobs?
my thought is, just count the line, and print out the lines after counting every 1M lines.But how can I do that with python?
edit: My input fastq file is actually in .gz compressed form.
split -l 4000000 XXX.recal.fastq.gz prefix
however, I just got one prefix-aa file which is exactly the same size as input. I don't know if it's because of the .gz form so that we cannot count the line?
when I tried
split -b 46m XXX.recal.fastq.gz prefix
it works well!!! The fastq.gz is successfully split into several smaller fastq.gz files.
so why cannot we use
-l 4000000 command?
another question:there is only a "prefix" option for split command; but is there a suffix option?(only suffix_length option)
because with prefix the output is XXX.fastq.gz-ab, which destroys the format of .gz file.
So I want sth. like XXX_1.fastq.gz (changing suffix), how can I do that?