Question: Reformat my txt file
0
gravatar for tianshenbio
7 weeks ago by
tianshenbio50
tianshenbio50 wrote:

I have a txt file that looks like this:

A a,b,q
B d
D f,m

how can I convert it to:

A a
A b
A q
B d
D f
D m
rna-seq gene ontology • 144 views
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by tianshenbio50

how is it related to bioinformatics ??

ADD REPLYlink written 7 weeks ago by Pierre Lindenbaum128k
3
gravatar for bruce.moran
7 weeks ago by
bruce.moran790
Ireland
bruce.moran790 wrote:

There are lots of ways, you should pick a language you would like to learn and then see if you can use it.

I picked Perl because the Python course was full.

This is 'command line' Perl which is really useful for this kind of simple parsing:

perl -ane '@s=split(/\,/, $F[1]); foreach $k (@s){print "$F[0] $k\n";}' txt.txt
ADD COMMENTlink written 7 weeks ago by bruce.moran790

Hi,

Thank you for your reply. What if there is a space after each comma?

ADD REPLYlink written 7 weeks ago by tianshenbio50

Please let us know how this question is related to bioinformatics or the post will be closed.

ADD REPLYlink written 7 weeks ago by RamRS27k

I think it's a toy example, and the A, B etc are indicative. User has some genuine bioinformatics q's.

Fair enough if you want to close it.

ADD REPLYlink written 7 weeks ago by bruce.moran790

I'm sure that is the case, but OP needs to add details on how this is related to bioinformatics for one simple reason - this might be a known or intermediate file format that others might encounter, and they may have the same question. Without context, there is no way they can locate this post and use your answer. In essence, OP's post as it is right now only has value to them and not to the community at large, and we do not encourage such posts.

ADD REPLYlink written 7 weeks ago by RamRS27k

If you have this

A b, c, d, e
B f, g, h,

Then awk -F',| ' '{for (i=2;i<NF+1;i++){if (length($i) > 0){print $1,$i}}}' txt.txt

It even solves this:

A b,c, d,e,f, g,h, i, o
B h,d, y,u, i, o,
C h  f d,d g      k    l

Or even if you have multiple spaces or comas between two letters, it will have the same output.

ADD REPLYlink modified 7 weeks ago by RamRS27k • written 7 weeks ago by Diedes20

Put a space after the comma in split:

@s=split(/\,/, $F[1])

to

@s=split(/\, /, $F[1])
ADD REPLYlink modified 7 weeks ago by RamRS27k • written 7 weeks ago by bruce.moran790

Please do not answer questions that are unrelated to bioinformatics. We are not StackOverflow. If OP is not willing to show us how their question is related to bioinformatics, do not encourage them.

ADD REPLYlink written 7 weeks ago by RamRS27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1258 users visited in the last hour