combine columns from multiple files
2
0
Entering edit mode
3.9 years ago
bk11 ▴ 60

I have more than 100 tab delimited files with seven different columns in each files. I was to get a single file with first and 7th column of first file and seventh columns of every other files. Any help will be appreciated.

file1:

  A     B   C   D   E   F   G
gene1   6   12  3   4   0   1
gene2   8   0   2   5   10  3
gene3   9   1   1   6   11  5
gene4   10  3   0   7   2   7


file 2:

  A     H   I   J   K   K   M
gene1   0   2   1   2   4   2
gene2   10  3   5   5   7   4
gene3   20  40  7   0   9   6
gene4   1   25  9   2   10  7


file3:

file4: so on

I want to get output like this:

  A     G   M
gene1   1   2
gene2   3   4
gene3   5   6
gene4   7   7

bash • 1.4k views
0
Entering edit mode

Please edit your question and explain how this is related to bioinformatics. If not, the post will be closed as off-topic.

0
Entering edit mode

If the first column of the input files are not same, paste command will give wrong results

0
Entering edit mode

This comment is not an answer by itself, but a comment on an existing answer. I've moved it to a comment on the top level post.

The question clearly shows that all files share the same first column. If that were not to be the case, OP would state that.

3
Entering edit mode
3.9 years ago
ATpoint 65k

paste <(cut -f1,7 file1) <(cut -f7 file2)

Assuming all files end on .txt using GNU parallel:

paste <(cut -f1,7 file1.txt) <(tr "\n" "\t" < <(ls *.txt | grep -v 'file1.txt' | sort -V | parallel -k "cut -f7 {}"))


or with a for loop:

paste <(cut -f1,7 file1.txt) <(tr "\n" "\t" < <(for i in ls *.txt | sort -V; do if [[ $i == "file1.txt" ]]; then continue; fi; cut -f7$i; done))

0
Entering edit mode

I not only have two files. I have 100 such files.

0
Entering edit mode

I edited my post. See if it works for you.

0
Entering edit mode

Perhaps worth noting that you can paste as many columns as you want (up to the limit on command line length), e.g.:

$paste <(cut -f1,7 file1) <(cut -f7 file2) <(...) <(cut -f7 fileN) > answer.txt  To do this programmatically, you might use a script to generate the <(cut -f7 fileX) statements for files 2 through N. ADD REPLY 0 Entering edit mode 3.9 years ago with datamash (present in most of the distro repos): (note: Do not output to a .txt file) output; $ paste *.txt | datamash transpose | awk 'NR==1{print};NR%7==0{print}' | datamash transpose
A   G   M
gene1   1   2
gene2   3   4
gene3   5   6
gene4   7   7


or

$cut -f1,7 *.txt | datamash -sg 1 collapse 2 | tr -s "," "\t" A G M gene1 1 2 gene2 3 4 gene3 5 6 gene4 7 7  input: $ tail -n+2 *.txt
==> file1.txt <==
gene1   6   12  3   4   0   1
gene2   8   0   2   5   10  3
gene3   9   1   1   6   11  5
gene4   10  3   0   7   2   7

==> file2.txt <==
gene1   0   2   1   2   4   2
gene2   10  3   5   5   7   4
gene3   20  40  7   0   9   6
gene4   1   25  9   2   10  7


Traffic: 1283 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.