I have a tab-delimited text file of SNP data that I need to split into smaller files, with each file containing data from SNPs in 20mb windows. My problem is how to split the files conditional on the numerical value in one of the columns.
SNP ID Physical distance rs_123132 12343 rs_123134 304354 rs_123434 8930044
I need a way to keep track of the distance between values in column 2 and when it becomes >= 20,000,000 to export all the rows within this block into a new file, and to do this for each block of 20,000,000 until the end of the file.
If possible I'd love to see this done in Python, as this is the language I am learning.
Thanks very much for any help!