Question: How to join two txt files in unix
gravatar for xiaoyonf
3 months ago by
Baylor College of Medicine, Houston, Texas, USA
xiaoyonf40 wrote:

Hi, I have a 1000 txt files with two columns: the gene symbol column, and the mutation status column. I want to join all of these files into one file, which will contain first gene symbol column and the following 1000 sample columns of mutation status. For example, I want to join the following input files:

txt file 1:

Gene Sample
A        ID1
B        ID1
D        ID1

txt file 2:

Gene Sample
B         ID2
C         ID2
E         ID2

txt file 3, ... txt file 1000

into the output file

Gene      ID1      ID2    ID3 ... ID1000
A          yes      NA      ...
B          yes      yes     ...
C          NA       yes     ...
D          yes      NA      ...
E          NA       yes     ...

I know the full_join solution in R using the dplyr package, but it need to read all the files into R. Does anyone has the simple solution in Unix to do this?

Thanks a lot! Xiaoyong

genome snp R gene • 170 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by xiaoyonf40
gravatar for Pierre Lindenbaum
3 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

convert your files to a format GENE/SAMPLE/VALUE

 awk '($1=="Gene"){SN=$2;next;} {printf("%s\t%s\t%s\n",$1,SN,$2);}'  input*

and pipe the output in datamash groupby

ADD COMMENTlink written 3 months ago by Pierre Lindenbaum133k

Thank you so much, Pierre. I will appreciate if you can explain me a more detail of the code and how to pipe out using datamash. It will be very helpful for me. Thanks.

ADD REPLYlink written 3 months ago by xiaoyonf40

Hi Pierre,

I really appreciate your response. I have modified my question to make it more precise. I haven't tried your solution yet, but I am afraid that it may need modified too. Thanks!

ADD REPLYlink written 3 months ago by xiaoyonf40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2434 users visited in the last hour