Hi guys,
I have a specific problem about using awk or sed to split a big file to different files. The big file is like this format(3 columns):
C    SRR1_45/1    data...
U    SRR2_34/2    data...
U    SRR1_33/2    data...
C    SRR3_22/1    data...
....
I want to extract lines with SRR1 to SRR1.txt, lines with SRR2 to SRR2.txt ...lines with SRRn to SRRn.txt. And the output lines should remove 'SRRi_' symbol. But we don't how many n are there.
e.g. SRR1.txt will contain:
C    45/1    data...
U    33/2    data...
I know it's easy to write a python or perl script to do it. But is there a shell way to do it? taking the advantages of awk or sed. Let me add some details: I have 10 such big files to be extracted. And each has more than 1000M lines. So I need to find a efficient way. The n is random which is not from sequential array.
Thanks! Tao
Thanks Alex! Your answer is amazing, especially the parallel way you introduced to me. Thank you so much! Best, Tao