Question: reformat a gene association file
0
gravatar for lessismore
7 months ago by
lessismore510
Mexico
lessismore510 wrote:

Hey all,

i have a text file with 3 columns tab separated: 1st column: a gene ID
2nd column: a value
3rd column: a list of genes associated to the one in the 1st column comma separated

TMCS09g1008699 6.4 TMCS09g1008677, TMCS09g1008681, TMCS09g1008685
TMCS09g1008690 5.3 TMCS09g1008686, TMCS09g1008680, TMCS09g1008675

etc..

what i want is this:

TMCS09g1008699 6.4 TMCS09g1008677
TMCS09g1008699 6.4 TMCS09g1008681
TMCS09g1008699 6.4 TMCS09g1008685
TMCS09g1008690 5.3 TMCS09g1008686
TMCS09g1008690 5.3 TMCS09g1008680
TMCS09g1008690 5.3 TMCS09g1008675

could someone help me?

awk bash R • 270 views
ADD COMMENTlink modified 7 months ago by h.mon20k • written 7 months ago by lessismore510
3
gravatar for Pierre Lindenbaum
7 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum113k wrote:
 awk '{for(i=3;i<=NF;i++) print $1,$2,$i;}' input.txt  | sed 's/,$//'
ADD COMMENTlink written 7 months ago by Pierre Lindenbaum113k

dear Pierre, i made a mistake, in the 3rd column theres no space in my input and now i get a wrong output.

ADD REPLYlink modified 7 months ago • written 7 months ago by lessismore510
1
gravatar for h.mon
7 months ago by
h.mon20k
Brazil
h.mon20k wrote:

Combine Pierre's answer with Tom Fenech's answer at StackOverflow.

Three hints: 1) pay attention at which column you split; 2) you will have two awk commands, separated by a comma; and 3) you will not need the sed command.

ADD COMMENTlink written 7 months ago by h.mon20k

got it, thanks

awk 'BEGIN{FS=OFS="\t"} {n=split($3,a,",");for(i=1;i<=n;i++) print $1,$2,a[i]}
ADD REPLYlink modified 7 months ago • written 7 months ago by lessismore510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1487 users visited in the last hour