Question

restructure rows and columns in perl or python (Interleaving columns by row pairs)

0

Entering edit mode

6.9 years ago

Ana ▴ 200

Hi all, I have a SNPsfile (containing 11 millions SNPs) which I was using to create covariance matrix in Bayenv, so each column in this file corresponds populations and rows are SNPs, but for every SNP I have 2 rows (for two alleles), look like below (2 * nsnps "rows" and npops "columns"):

7        2       2       0        6        2       2
1        0       0       0        0        0       0
0        2       2       0        0        0       0
1        0       0       0        0        0       0

So in the example above I have 7 populations (columns) and 2 SNPs (rows). I need to modify the format of this file a bit. In the new file each row should correspond to one SNP and the number of columns should be twice the number of populations because each pair of numbers corresponds to each allele. So the new file should look like this ( nsnps "rows" and 2*npops "columns"):

7   1   2    0    2   0    0   0    6   0   2   0   2   0
0   1   2    0    2   0    0   0    0   0   0   0   0   0

I have Rcodes which do this manipulation job for me, but it seems that R is so slow, I just want to ask can anyone help me to figure out if there is anyway to do it in Perl or Python. I am new to both of them, I would appreciate any help to fix this issue. Thanks

perl rows columns dataframe python • 3.0k views

ADD COMMENT • link updated 6.9 years ago by WouterDeCoster 47k • written 6.9 years ago by Ana ▴ 200

0

Entering edit mode

Can you show your R code and tell the size of your matrix and how much RAM you have available? Transposing a matrix should be quick in R, unless your matrix is too big and you are swapping to disk.

ADD REPLY • link 6.9 years ago by h.mon 35k

0

Entering edit mode

It's not exactly transposing.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

You are right, it is not near transposing.

ADD REPLY • link 6.9 years ago by h.mon 35k

0

Entering edit mode

Which is great because there is no need to load the entire matrix.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

is it me or someone who did not understand the output format. I feel so noob and still, cannot figure out what the OP wanted. Am glad Wouter figured it out but I would be glad if I can understand what the OP is trying to achieve. It will be nice to learn something new. :)

ADD REPLY • link 6.9 years ago by ivivek_ngs ★ 5.2k

1

Entering edit mode

Ah, I understood now the format and what the OP is trying to achieve. Actually, it is not replacing rows to the column to its entirety. If the moderator could help in changing the question else it will be misleading.

ADD REPLY • link 6.9 years ago by ivivek_ngs ★ 5.2k

1

Entering edit mode

I changed it to "restructure", can't think of anything more specific.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

yes, it is much better now and a reader will not be misled. Thanks, @Wouter. At least other readers will simply not copy the code rather read the query posted if they need any help with this post.

ADD REPLY • link 6.9 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Interleaving columns by row pairs? Interleaving columns by every two rows?

ADD REPLY • link 6.9 years ago by h.mon 35k

0

Entering edit mode

That's a good one! ;)

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

score 7 · Accepted Answer · 2017-06-01

7

Entering edit mode

6.9 years ago

WouterDeCoster 47k

I think this should do the job:

Save as rearrangingalleles.py and execute as python rearrangingalleles.py myinput.txt > myoutput.txt

ADD COMMENT • link 6.9 years ago by WouterDeCoster 47k

2

Entering edit mode

More profesional use .write:

import sys
output = open('output.txt', 'w')
with open(sys.argv[1]) as input:
  while True:
    line1 = [item for item in input.readline().split()]
    if len(line1) == 0:
        break
    line2 = [item for item in input.readline().split()]
    output.write(' '.join([line1[i] + " " + line2[i] for i in range(len(line1))]) + '\n')
print '\n' + '\t' + 'Job completed!'

Save as rearrangingalleles.py and execute as python rearrangingalleles.py myinput.txt

ADD REPLY • link 6.9 years ago by Buffo ★ 2.4k

3

Entering edit mode

I firmly disagree.
It's a very convenient feature if scripts write to stdout, as such you can use them when piping. Also, it allows specifying both the output name and output directory.

While talking about more professional:

didn't close the output file
you should use the 'with' statement for opening files

e.g.:

with open(sys.argv[1]) as input, open('output.txt', 'w') as ouput:

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

2

Entering edit mode

Also, it allows specifying both the output name and output directory.

prefix = sys.argv[1].split('.')[0]
output = open(prefix + '_output.txt', 'w')    #you will never have to specify output name or directory

you should use the 'with' statement for opening files

??? Why?

ADD REPLY • link 6.9 years ago by Buffo ★ 2.4k

2

Entering edit mode

It's nice that you defend your opinion, but I would suggest considering you might be wrong.

Your code will crash if the current directory is not writeable for the user, which is not an uncommon situation for directories with (shared) data files. Also, the user doesn't have the freedom to

Stream the output to another command on stdin (fundamental concept in unix pipelines)
Choose the output name and directory of their choice

With regard to "why using the with statement" see this page of Python for beginners.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

2

Entering edit mode

You are a very funny person WouterDeCoster :). I'm glad you're learning to program but, in addition to online courses I recommend using common sense, sometimes it is very useful!

Best.

ADD REPLY • link 6.9 years ago by Buffo ★ 2.4k

3

Entering edit mode

Due to the limited meta-communication online I'm not sure if you are just trolling me or are genuinely an arrogant fool. Anyway, thanks for the advice.

Have a nice day.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks so much WouterDeCoster, it produced exactly the file I wished in less than 1 minute.