I think we always could come arcoss large files, say, fastq files. Now I have really super large fastq files, around 10GB. I need to first split it into some smaller files. I've written some python script to do this, with the algorithm of taking the whole file into memory (similar to readlines(), or zcat file), thus leading to insufficiency of memory (60GB memory on cluster node, but still not enough)
Just wondering, is there any algorithm which doesn't take the whole file, but read line by line? Anyone wanna share any script for splitting?
BTW, I'm not doing BWA paired-end mapping; but curious is it possible to run BWA with an input file with size around 10GB? thx
Actually I'm using python to split. One of the key command here is:
input = commands.getoutput('zcat ' + fastqfile).splitlines(True)
Seems a bit faster than
readlines(); but basically the idea is still to create "list" or in perl called "array". Then I can manipulate specific line of the list, say list (the 1000th line)