Question

combine columns from multiple files

0

Entering edit mode

6.7 years ago

bk11 ★ 3.1k

I have more than 100 tab delimited files with seven different columns in each files. I was to get a single file with first and 7th column of first file and seventh columns of every other files. Any help will be appreciated.

file1:

  A     B   C   D   E   F   G
gene1   6   12  3   4   0   1
gene2   8   0   2   5   10  3
gene3   9   1   1   6   11  5
gene4   10  3   0   7   2   7

file 2:

  A     H   I   J   K   K   M
gene1   0   2   1   2   4   2
gene2   10  3   5   5   7   4
gene3   20  40  7   0   9   6
gene4   1   25  9   2   10  7

file3:

file4: so on

I want to get output like this:

  A     G   M
gene1   1   2
gene2   3   4
gene3   5   6
gene4   7   7

bash • 3.2k views

ADD COMMENT • link updated 5.6 years ago by rse ▴ 100 • written 6.7 years ago by bk11 ★ 3.1k

0

Entering edit mode

Please edit your question and explain how this is related to bioinformatics. If not, the post will be closed as off-topic.

ADD REPLY • link 6.7 years ago by Ram 45k

0

Entering edit mode

If the first column of the input files are not same, paste command will give wrong results

ADD REPLY • link 5.6 years ago by rse ▴ 100

0

Entering edit mode

This comment is not an answer by itself, but a comment on an existing answer. I've moved it to a comment on the top level post.

The question clearly shows that all files share the same first column. If that were not to be the case, OP would state that.

ADD REPLY • link 5.6 years ago by Ram 45k

score 3 · Answer 1 · 2018-11-06

3

Entering edit mode

6.7 years ago

ATpoint 88k

~~paste <(cut -f1,7 file1) <(cut -f7 file2)~~

Assuming all files end on .txt using GNU parallel:

paste <(cut -f1,7 file1.txt) <(tr "\n" "\t" < <(ls *.txt | grep -v 'file1.txt' | sort -V | parallel -k "cut -f7 {}"))

or with a for loop:

paste <(cut -f1,7 file1.txt) <(tr "\n" "\t" < <(for i in `ls *.txt | sort -V`; do if [[ $i == "file1.txt" ]]; then continue; fi; cut -f7 $i; done))

ADD COMMENT • link 6.7 years ago by ATpoint 88k

0

Entering edit mode

I not only have two files. I have 100 such files.

ADD REPLY • link 6.7 years ago by bk11 ★ 3.1k

0

Entering edit mode

I edited my post. See if it works for you.

ADD REPLY • link 6.7 years ago by ATpoint 88k

0

Entering edit mode

Perhaps worth noting that you can paste as many columns as you want (up to the limit on command line length), e.g.:

$ paste <(cut -f1,7 file1) <(cut -f7 file2) <(...) <(cut -f7 fileN) > answer.txt

To do this programmatically, you might use a script to generate the <(cut -f7 fileX) statements for files 2 through N.

ADD REPLY • link 6.7 years ago by Alex Reynolds 36k

score 0 · Answer 2 · 2018-11-07

with datamash (present in most of the distro repos): (note: Do not output to a .txt file)

output;

$ paste *.txt | datamash transpose | awk 'NR==1{print};NR%7==0{print}' | datamash transpose
A   G   M
gene1   1   2
gene2   3   4
gene3   5   6
gene4   7   7

or

$ cut -f1,7 *.txt | datamash  -sg 1 collapse 2  | tr -s "," "\t"

A   G   M
gene1   1   2
gene2   3   4
gene3   5   6
gene4   7   7

input:

$ tail -n+2 *.txt
==> file1.txt <==
gene1   6   12  3   4   0   1
gene2   8   0   2   5   10  3
gene3   9   1   1   6   11  5
gene4   10  3   0   7   2   7

==> file2.txt <==
gene1   0   2   1   2   4   2
gene2   10  3   5   5   7   4
gene3   20  40  7   0   9   6
gene4   1   25  9   2   10  7