Question: compare two tables in files
0
gravatar for erickfqqa
4 months ago by
erickfqqa10
erickfqqa10 wrote:

I have two files in txt format, and I want to compare the first column in the file 1 and the file_2 and see which rows are different in the file_1 and add a column in the file_1 wiht 0 and 1, 0 for different and 1 for equals

file_1

Organism_1
Organism_2
Organism_3
Organism_4
Organism_5
Organism_6
Organism_7
Organism_8
Organism_9
Organism_10
Organism_11

file_2

    Organism_1
    Organism_2
    Organism_3
    Organism_4
    Organism_5
    Organism_6
    Organism_7
    Organism_8
    Organism_9
    Organism_10
    Organism_11
    Organism_12
    Organism_13
    Organism_14
    Organism_15
    Organism_16
    Organism_17
    Organism_18
    Organism_19
    Organism_20
    Organism_21
    Organism_22
    Organism_23
    Organism_24

I want a output like this file_3.txt

Organism_1  1
Organism_2  1
Organism_3  1
Organism_4  1
Organism_5  1
Organism_6  1
Organism_7  1
Organism_8  1
Organism_9  1
Organism_10 1
Organism_11 1
Organism_12 0
Organism_13 0
Organism_14 0
Organism_15 0
Organism_16 0
Organism_17 0
Organism_18 0
Organism_19 0
Organism_20 0
Organism_21 0
Organism_22 0
Organism_23 0
Organism_24 0

I've been trying with this to see which rows are different, but I don't know how to add the binary column

comm <(sort file_1.txt) <(sort file_2.txt)

I hope somebody can help me

bash linux perl • 189 views
ADD COMMENTlink modified 4 months ago by cpad011211k • written 4 months ago by erickfqqa10
1
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:
comm <(sort file_1.txt ) <(sort file_2.txt) |\
awk -F '\t' '($1!="") {print $1,0;next;}($2!="") {print $2,0;next;} ($3!="") {print $3,1;next;}' |\
sort -V


Organism_1 1
Organism_2 1
Organism_3 1
Organism_4 1
Organism_5 1
Organism_6 1
Organism_7 1
Organism_8 1
Organism_9 1
Organism_10 1
Organism_11 1
Organism_12 0
Organism_13 0
Organism_14 0
Organism_15 0
Organism_16 0
Organism_17 0
Organism_18 0
Organism_19 0
Organism_20 0
Organism_21 0
Organism_22 0
Organism_23 0
Organism_24 0
ADD COMMENTlink written 4 months ago by Pierre Lindenbaum116k

I have tried to with your awk code, but I'm getting repeated rows an row with 0 and one row with 1 for example

Organim_12 0 Organism_12 1 Organism_13 0 Organism_13 1

ADD REPLYlink written 4 months ago by erickfqqa10
1
gravatar for cpad0112
4 months ago by
cpad011211k
India
cpad011211k wrote:
$ sort -V test1.txt test2.txt | uniq -ci | sed 's/^\s\+//g'| awk -v OFS="\t" '{print $2,$1-1}'

Organism_1  1
Organism_2  1
Organism_3  1
Organism_4  1
Organism_5  1
Organism_6  1
Organism_7  1
Organism_8  1
Organism_9  1
Organism_10 1
Organism_11 1
Organism_12 0
Organism_13 0
Organism_14 0
Organism_15 0
Organism_16 0
Organism_17 0
Organism_18 0
Organism_19 0
Organism_20 0
Organism_21 0
Organism_22 0
Organism_23 0
Organism_24 0

Using datamash / tsv-utils:

 $ sort -V test1.txt test2.txt |  datamash -s -g 1 count 1 | sort -V | awk -v OFS="\t" '{print $1,$2-1}'
 $ sort -V test1.txt test2.txt |  tsv-summarize --group-by 1 --count  | awk -v OFS="\t" '{print $1,$2-1}'
ADD COMMENTlink modified 4 months ago • written 4 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2033 users visited in the last hour