Change SNP file format
2
0
Entering edit mode
6.4 years ago
elhamidihay ▴ 30

I have tried several awk and sed commands to change the format of this SNP file with no success. I have an SNP file with a format that looks like the following:

 ind_1      SNP_1    AA
 ind_1      SNP_2    AB
 ind_1      SNP_3    AA
 ind_2      SNP_1    AA
 ind_2      SNP_2    AA
 ind_3      SNP_1    AB
 ind_3      SNP_2    AA
 ind_3      SNP_3    AB
 ind_3      SNP_4    AA

The desired format:

        SNP_1      SNP_2    SNP_3      SNP_4
ind_1      AA       AB       AA         ??
ind_2      AA       AA       ??         ??
ind_3      AB       AA       AB         AA
SNP perl python format • 3.1k views
ADD COMMENT
2
Entering edit mode
6.4 years ago

Here's a Python-based approach:

#!/usr/bin/env python

import sys

d = {}
r = []
c = []

for line in sys.stdin:
    (row, col, val) = line.strip().split('\t')
    if row not in d:
        d[row] = {}
        r.append(row)
    if col not in d[row]:
        d[row][col] = val
    if col not in c:
        c.append(col)

sys.stdout.write("\t%s\n" % ('\t'.join(c)))
for row in r:
    nr = []
    for col in c:
        try:
            nr.append(d[row][col])
        except KeyError:
            nr.append('??')
    sys.stdout.write("%s\t%s\n" % (row, '\t'.join(nr)))

Then:

$ ./condense.py < data.txt
        SNP_1   SNP_2   SNP_3   SNP_4
ind_1   AA      AB      AA      ??
ind_2   AA      AA      ??      ??
ind_3   AB      AA      AB      AA
ADD COMMENT
1
Entering edit mode

that worked! thank you

ADD REPLY
2
Entering edit mode
ADD COMMENT
0
Entering edit mode

unfortunately datamash did not give me the format i need

ADD REPLY
3
Entering edit mode

output:

 $ datamash  crosstab 1,2 unique 3 --filler=??< data.txt 
        SNP_1   SNP_2   SNP_3   SNP_4
    ind_1   AA  AB  AA  ??
    ind_2   AA  AA  ??  ??
    ind_3   AB  AA  AB  AA

Input:

$ cat data.txt 
ind_1   SNP_1   AA
ind_1   SNP_2   AB
ind_1   SNP_3   AA
ind_2   SNP_1   AA
ind_2   SNP_2   AA
ind_3   SNP_1   AB
ind_3   SNP_2   AA
ind_3   SNP_3   AB
ind_3   SNP_4   AA
ADD REPLY
2
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 3161 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6