Question: Change SNP file format
0
gravatar for elhamidihay
9 months ago by
elhamidihay30
elhamidihay30 wrote:

I have tried several awk and sed commands to change the format of this SNP file with no success. I have an SNP file with a format that looks like the following:

 ind_1      SNP_1    AA
 ind_1      SNP_2    AB
 ind_1      SNP_3    AA
 ind_2      SNP_1    AA
 ind_2      SNP_2    AA
 ind_3      SNP_1    AB
 ind_3      SNP_2    AA
 ind_3      SNP_3    AB
 ind_3      SNP_4    AA

The desired format:

        SNP_1      SNP_2    SNP_3      SNP_4
ind_1      AA       AB       AA         ??
ind_2      AA       AA       ??         ??
ind_3      AB       AA       AB         AA
snp format python perl • 595 views
ADD COMMENTlink modified 9 months ago by Alex Reynolds25k • written 9 months ago by elhamidihay30
1
gravatar for Alex Reynolds
9 months ago by
Alex Reynolds25k
Seattle, WA USA
Alex Reynolds25k wrote:

Here's a Python-based approach:

#!/usr/bin/env python

import sys

d = {}
r = []
c = []

for line in sys.stdin:
    (row, col, val) = line.strip().split('\t')
    if row not in d:
        d[row] = {}
        r.append(row)
    if col not in d[row]:
        d[row][col] = val
    if col not in c:
        c.append(col)

sys.stdout.write("\t%s\n" % ('\t'.join(c)))
for row in r:
    nr = []
    for col in c:
        try:
            nr.append(d[row][col])
        except KeyError:
            nr.append('??')
    sys.stdout.write("%s\t%s\n" % (row, '\t'.join(nr)))

Then:

$ ./condense.py < data.txt
        SNP_1   SNP_2   SNP_3   SNP_4
ind_1   AA      AB      AA      ??
ind_2   AA      AA      ??      ??
ind_3   AB      AA      AB      AA
ADD COMMENTlink written 9 months ago by Alex Reynolds25k
1

that worked! thank you

ADD REPLYlink written 9 months ago by elhamidihay30
1
gravatar for Pierre Lindenbaum
9 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum111k wrote:

use gnu datamash : https://www.gnu.org/software/datamash/examples/#example_transpose

ADD COMMENTlink written 9 months ago by Pierre Lindenbaum111k

unfortunately datamash did not give me the format i need

ADD REPLYlink written 9 months ago by elhamidihay30
3

output:

 $ datamash  crosstab 1,2 unique 3 --filler=??< data.txt 
        SNP_1   SNP_2   SNP_3   SNP_4
    ind_1   AA  AB  AA  ??
    ind_2   AA  AA  ??  ??
    ind_3   AB  AA  AB  AA

Input:

$ cat data.txt 
ind_1   SNP_1   AA
ind_1   SNP_2   AB
ind_1   SNP_3   AA
ind_2   SNP_1   AA
ind_2   SNP_2   AA
ind_3   SNP_1   AB
ind_3   SNP_2   AA
ind_3   SNP_3   AB
ind_3   SNP_4   AA
ADD REPLYlink modified 9 months ago • written 9 months ago by cpad01128.3k
2

"it doesnt work"

ADD REPLYlink written 9 months ago by Pierre Lindenbaum111k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 476 users visited in the last hour