Question: Change SNP file format
0
gravatar for elhamidihay
11 months ago by
elhamidihay30
elhamidihay30 wrote:

I have tried several awk and sed commands to change the format of this SNP file with no success. I have an SNP file with a format that looks like the following:

 ind_1      SNP_1    AA
 ind_1      SNP_2    AB
 ind_1      SNP_3    AA
 ind_2      SNP_1    AA
 ind_2      SNP_2    AA
 ind_3      SNP_1    AB
 ind_3      SNP_2    AA
 ind_3      SNP_3    AB
 ind_3      SNP_4    AA

The desired format:

        SNP_1      SNP_2    SNP_3      SNP_4
ind_1      AA       AB       AA         ??
ind_2      AA       AA       ??         ??
ind_3      AB       AA       AB         AA
snp format python perl • 697 views
ADD COMMENTlink modified 11 months ago by Alex Reynolds26k • written 11 months ago by elhamidihay30
1
gravatar for Alex Reynolds
11 months ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

Here's a Python-based approach:

#!/usr/bin/env python

import sys

d = {}
r = []
c = []

for line in sys.stdin:
    (row, col, val) = line.strip().split('\t')
    if row not in d:
        d[row] = {}
        r.append(row)
    if col not in d[row]:
        d[row][col] = val
    if col not in c:
        c.append(col)

sys.stdout.write("\t%s\n" % ('\t'.join(c)))
for row in r:
    nr = []
    for col in c:
        try:
            nr.append(d[row][col])
        except KeyError:
            nr.append('??')
    sys.stdout.write("%s\t%s\n" % (row, '\t'.join(nr)))

Then:

$ ./condense.py < data.txt
        SNP_1   SNP_2   SNP_3   SNP_4
ind_1   AA      AB      AA      ??
ind_2   AA      AA      ??      ??
ind_3   AB      AA      AB      AA
ADD COMMENTlink written 11 months ago by Alex Reynolds26k
1

that worked! thank you

ADD REPLYlink written 11 months ago by elhamidihay30
1
gravatar for Pierre Lindenbaum
11 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

use gnu datamash : https://www.gnu.org/software/datamash/examples/#example_transpose

ADD COMMENTlink written 11 months ago by Pierre Lindenbaum112k

unfortunately datamash did not give me the format i need

ADD REPLYlink written 11 months ago by elhamidihay30
3

output:

 $ datamash  crosstab 1,2 unique 3 --filler=??< data.txt 
        SNP_1   SNP_2   SNP_3   SNP_4
    ind_1   AA  AB  AA  ??
    ind_2   AA  AA  ??  ??
    ind_3   AB  AA  AB  AA

Input:

$ cat data.txt 
ind_1   SNP_1   AA
ind_1   SNP_2   AB
ind_1   SNP_3   AA
ind_2   SNP_1   AA
ind_2   SNP_2   AA
ind_3   SNP_1   AB
ind_3   SNP_2   AA
ind_3   SNP_3   AB
ind_3   SNP_4   AA
ADD REPLYlink modified 11 months ago • written 11 months ago by cpad01129.3k
2

"it doesnt work"

ADD REPLYlink written 11 months ago by Pierre Lindenbaum112k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1674 users visited in the last hour