Question: How to join two txt files in unix
0
gravatar for xiaoyonf
3 months ago by
xiaoyonf40
Baylor College of Medicine, Houston, Texas, USA
xiaoyonf40 wrote:

Hi, I have a 1000 txt files with two columns: the gene symbol column, and the mutation status column. I want to join all of these files into one file, which will contain first gene symbol column and the following 1000 sample columns of mutation status. For example, I want to join the following input files:

txt file 1:

Gene Sample
A        ID1
B        ID1
D        ID1

txt file 2:

Gene Sample
B         ID2
C         ID2
E         ID2

txt file 3, ... txt file 1000

into the output file

Gene      ID1      ID2    ID3 ... ID1000
A          yes      NA      ...
B          yes      yes     ...
C          NA       yes     ...
D          yes      NA      ...
E          NA       yes     ...
...

I know the full_join solution in R using the dplyr package, but it need to read all the files into R. Does anyone has the simple solution in Unix to do this?

Thanks a lot! Xiaoyong

genome snp R gene • 170 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by xiaoyonf40
3
gravatar for Pierre Lindenbaum
3 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

convert your files to a format GENE/SAMPLE/VALUE

 awk '($1=="Gene"){SN=$2;next;} {printf("%s\t%s\t%s\n",$1,SN,$2);}'  input*

and pipe the output in datamash groupby

ADD COMMENTlink written 3 months ago by Pierre Lindenbaum133k

Thank you so much, Pierre. I will appreciate if you can explain me a more detail of the code and how to pipe out using datamash. It will be very helpful for me. Thanks.

ADD REPLYlink written 3 months ago by xiaoyonf40

Hi Pierre,

I really appreciate your response. I have modified my question to make it more precise. I haven't tried your solution yet, but I am afraid that it may need modified too. Thanks!

ADD REPLYlink written 3 months ago by xiaoyonf40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2434 users visited in the last hour
_