How to join two txt files in unix
1
0
Entering edit mode
3.5 years ago
xiaoyonf ▴ 60

Hi, I have a 1000 txt files with two columns: the gene symbol column, and the mutation status column. I want to join all of these files into one file, which will contain first gene symbol column and the following 1000 sample columns of mutation status. For example, I want to join the following input files:

txt file 1:

Gene Sample
A        ID1
B        ID1
D        ID1

txt file 2:

Gene Sample
B         ID2
C         ID2
E         ID2

txt file 3, ... txt file 1000

into the output file

Gene      ID1      ID2    ID3 ... ID1000
A          yes      NA      ...
B          yes      yes     ...
C          NA       yes     ...
D          yes      NA      ...
E          NA       yes     ...
...

I know the full_join solution in R using the dplyr package, but it need to read all the files into R. Does anyone has the simple solution in Unix to do this?

Thanks a lot! Xiaoyong

gene snp R genome • 730 views
ADD COMMENT
3
Entering edit mode
3.5 years ago

convert your files to a format GENE/SAMPLE/VALUE

 awk '($1=="Gene"){SN=$2;next;} {printf("%s\t%s\t%s\n",$1,SN,$2);}'  input*

and pipe the output in datamash groupby

ADD COMMENT
0
Entering edit mode

Thank you so much, Pierre. I will appreciate if you can explain me a more detail of the code and how to pipe out using datamash. It will be very helpful for me. Thanks.

ADD REPLY
0
Entering edit mode

Hi Pierre,

I really appreciate your response. I have modified my question to make it more precise. I haven't tried your solution yet, but I am afraid that it may need modified too. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6