Question: combine columns from multiple files
0
gravatar for bk11
5 months ago by
bk1130
bk1130 wrote:

I have more than 100 tab delimited files with seven different columns in each files. I was to get a single file with first and 7th column of first file and seventh columns of every other files. Any help will be appreciated.

file1:

  A     B   C   D   E   F   G
gene1   6   12  3   4   0   1
gene2   8   0   2   5   10  3
gene3   9   1   1   6   11  5
gene4   10  3   0   7   2   7

file 2:

  A     H   I   J   K   K   M
gene1   0   2   1   2   4   2
gene2   10  3   5   5   7   4
gene3   20  40  7   0   9   6
gene4   1   25  9   2   10  7

file3:

file4: so on

I want to get output like this:

  A     G   M
gene1   1   2
gene2   3   4
gene3   5   6
gene4   7   7
bash • 304 views
ADD COMMENTlink modified 5 months ago by cpad011211k • written 5 months ago by bk1130

Please edit your question and explain how this is related to bioinformatics. If not, the post will be closed as off-topic.

ADD REPLYlink written 5 months ago by RamRS21k
3
gravatar for ATpoint
5 months ago by
ATpoint15k
Germany
ATpoint15k wrote:

paste <(cut -f1,7 file1) <(cut -f7 file2)

Assuming all files end on .txt using GNU parallel:

paste <(cut -f1,7 file1.txt) <(tr "\n" "\t" < <(ls *.txt | grep -v 'file1.txt' | sort -V | parallel -k "cut -f7 {}"))

or with a for loop:

paste <(cut -f1,7 file1.txt) <(tr "\n" "\t" < <(for i in `ls *.txt | sort -V`; do if [[ $i == "file1.txt" ]]; then continue; fi; cut -f7 $i; done))
ADD COMMENTlink modified 5 months ago • written 5 months ago by ATpoint15k

I not only have two files. I have 100 such files.

ADD REPLYlink modified 5 months ago • written 5 months ago by bk1130

I edited my post. See if it works for you.

ADD REPLYlink modified 5 months ago • written 5 months ago by ATpoint15k

Perhaps worth noting that you can paste as many columns as you want (up to the limit on command line length), e.g.:

$ paste <(cut -f1,7 file1) <(cut -f7 file2) <(...) <(cut -f7 fileN) > answer.txt

To do this programmatically, you might use a script to generate the <(cut -f7 fileX) statements for files 2 through N.

ADD REPLYlink modified 5 months ago • written 5 months ago by Alex Reynolds28k
0
gravatar for cpad0112
5 months ago by
cpad011211k
India
cpad011211k wrote:

with datamash (present in most of the distro repos): (note: Do not output to a .txt file)

output;

$ paste *.txt | datamash transpose | awk 'NR==1{print};NR%7==0{print}' | datamash transpose
A   G   M
gene1   1   2
gene2   3   4
gene3   5   6
gene4   7   7

or

$ cut -f1,7 *.txt | datamash  -sg 1 collapse 2  | tr -s "," "\t"

A   G   M
gene1   1   2
gene2   3   4
gene3   5   6
gene4   7   7

input:

$ tail -n+2 *.txt
==> file1.txt <==
gene1   6   12  3   4   0   1
gene2   8   0   2   5   10  3
gene3   9   1   1   6   11  5
gene4   10  3   0   7   2   7

==> file2.txt <==
gene1   0   2   1   2   4   2
gene2   10  3   5   5   7   4
gene3   20  40  7   0   9   6
gene4   1   25  9   2   10  7
ADD COMMENTlink modified 5 months ago • written 5 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1206 users visited in the last hour