Question: arranging columns and rows
0
gravatar for Ambika
16 months ago by
Ambika30
United States/Auburn/Auburn University
Ambika30 wrote:

Hello everyone,

I have File 1 like this with 2 columns:

g4989   2.70224323450382
g4650   2.71483380183318
g11701  2.83907744860811
g11701  2.83907744860811
g3807   2.83912968405616
g17931  2.84821618321646

and File 2 like this with 4 columns

g4989
g4650  Pfam    PF00172 FungalZn(2)-Cys(6)binuclearclusterdomain
g11701  Pfam    PF04082 Fungalspecifictranscriptionfactordomain
g17931  Pfam    PF04082 Fungalspecifictranscriptionfactordomain

Both of the files are tab delimited. File 2 only contains the selective genes from File 1. I want The to add a second column from file1 to file 2 but only for the genes in file two like this:

    g4989     2.70 
    g4650     2.71         Pfam    PF00172 FungalZn(2)-Cys(6)binuclearclusterdomain
    g11701   2.83         Pfam    PF04082 Fungalspecifictranscriptionfactordomain
    g17931   2.84         Pfam    PF04082 Fungalspecifictranscriptionfactordomain

Could you please help me sort this out in linux.

Thank you, Ambika

bash awk grep • 438 views
ADD COMMENTlink modified 16 months ago by Pierre Lindenbaum122k • written 16 months ago by Ambika30

g11701 is present twice in file1. How should you handle this ?

ADD REPLYlink modified 16 months ago • written 16 months ago by Pierre Lindenbaum122k

Yes its present twice, and this is just a sample some of the genes might be present more than that because single gene might have different pfam domains.

ADD REPLYlink modified 16 months ago • written 16 months ago by Ambika30
2
gravatar for Pierre Lindenbaum
16 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

use join

join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1.txt ) <(sort -t $'\t' -k1,1 file2.txt )

g11701  2.83907744860811    Pfam    PF04082 Fungalspecifictranscriptionfactordomain
g11701  2.83907744860811    Pfam    PF04082 Fungalspecifictranscriptionfactordomain
g17931  2.84821618321646    Pfam    PF04082 Fungalspecifictranscriptionfactordomain
g4650   2.71483380183318    Pfam    PF00172 FungalZn(2)-Cys(6)binuclearclusterdomain
g4989   2.70224323450382
ADD COMMENTlink written 16 months ago by Pierre Lindenbaum122k

Hi Pierre, Thank you but the problem is I got the output file but the number of rows that I have for output file is not the same as in File 2.

ADD REPLYlink written 16 months ago by Ambika30

it happens if , like in your example, there is a duplicated key : eg: g11701 . see also the option -v and -a of join

ADD REPLYlink written 16 months ago by Pierre Lindenbaum122k

Thank you so much for your help.

ADD REPLYlink written 16 months ago by Ambika30
1

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 16 months ago by Pierre Lindenbaum122k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 821 users visited in the last hour