Question: reformat a gene association file
0
gravatar for lessismore
5 weeks ago by
lessismore470
Mexico
lessismore470 wrote:

Hey all,

i have a text file with 3 columns tab separated: 1st column: a gene ID
2nd column: a value
3rd column: a list of genes associated to the one in the 1st column comma separated

TMCS09g1008699 6.4 TMCS09g1008677, TMCS09g1008681, TMCS09g1008685
TMCS09g1008690 5.3 TMCS09g1008686, TMCS09g1008680, TMCS09g1008675

etc..

what i want is this:

TMCS09g1008699 6.4 TMCS09g1008677
TMCS09g1008699 6.4 TMCS09g1008681
TMCS09g1008699 6.4 TMCS09g1008685
TMCS09g1008690 5.3 TMCS09g1008686
TMCS09g1008690 5.3 TMCS09g1008680
TMCS09g1008690 5.3 TMCS09g1008675

could someone help me?

awk bash R • 164 views
ADD COMMENTlink modified 5 weeks ago by h.mon13k • written 5 weeks ago by lessismore470
3
gravatar for Pierre Lindenbaum
5 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum106k wrote:
 awk '{for(i=3;i<=NF;i++) print $1,$2,$i;}' input.txt  | sed 's/,$//'
ADD COMMENTlink written 5 weeks ago by Pierre Lindenbaum106k

dear Pierre, i made a mistake, in the 3rd column theres no space in my input and now i get a wrong output.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by lessismore470
1
gravatar for h.mon
5 weeks ago by
h.mon13k
Brazil
h.mon13k wrote:

Combine Pierre's answer with Tom Fenech's answer at StackOverflow.

Three hints: 1) pay attention at which column you split; 2) you will have two awk commands, separated by a comma; and 3) you will not need the sed command.

ADD COMMENTlink written 5 weeks ago by h.mon13k

got it, thanks

awk 'BEGIN{FS=OFS="\t"} {n=split($3,a,",");for(i=1;i<=n;i++) print $1,$2,a[i]}
ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by lessismore470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 951 users visited in the last hour