Question: reformat a gene association file
0
gravatar for lessismore
10 months ago by
lessismore560
Mexico
lessismore560 wrote:

Hey all,

i have a text file with 3 columns tab separated: 1st column: a gene ID
2nd column: a value
3rd column: a list of genes associated to the one in the 1st column comma separated

TMCS09g1008699 6.4 TMCS09g1008677, TMCS09g1008681, TMCS09g1008685
TMCS09g1008690 5.3 TMCS09g1008686, TMCS09g1008680, TMCS09g1008675

etc..

what i want is this:

TMCS09g1008699 6.4 TMCS09g1008677
TMCS09g1008699 6.4 TMCS09g1008681
TMCS09g1008699 6.4 TMCS09g1008685
TMCS09g1008690 5.3 TMCS09g1008686
TMCS09g1008690 5.3 TMCS09g1008680
TMCS09g1008690 5.3 TMCS09g1008675

could someone help me?

awk bash R • 320 views
ADD COMMENTlink modified 10 months ago by h.mon22k • written 10 months ago by lessismore560
3
gravatar for Pierre Lindenbaum
10 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:
 awk '{for(i=3;i<=NF;i++) print $1,$2,$i;}' input.txt  | sed 's/,$//'
ADD COMMENTlink written 10 months ago by Pierre Lindenbaum116k

dear Pierre, i made a mistake, in the 3rd column theres no space in my input and now i get a wrong output.

ADD REPLYlink modified 10 months ago • written 10 months ago by lessismore560
1
gravatar for h.mon
10 months ago by
h.mon22k
Brazil
h.mon22k wrote:

Combine Pierre's answer with Tom Fenech's answer at StackOverflow.

Three hints: 1) pay attention at which column you split; 2) you will have two awk commands, separated by a comma; and 3) you will not need the sed command.

ADD COMMENTlink written 10 months ago by h.mon22k

got it, thanks

awk 'BEGIN{FS=OFS="\t"} {n=split($3,a,",");for(i=1;i<=n;i++) print $1,$2,a[i]}
ADD REPLYlink modified 10 months ago • written 10 months ago by lessismore560
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1597 users visited in the last hour