Question: Extract rows present in file1 and not in file 2
1
gravatar for biostarsb
4.1 years ago by
biostarsb30
biostarsb30 wrote:

I have two files with genes

File one (with 40000 genes)

Gene 1
Gene 2
Gene 3
Gene b
Gene f
Gene c
Gene r
Gene z

File two (with 39000 genes)

Gene 1
Gene 3
Gene 2
Gene b

I would like to know if there is a command line (with awk or bash) to extract that lines that exist in the one file and not in the two file

bash awk gene • 4.9k views
ADD COMMENTlink modified 2.6 years ago by zx875410.0k • written 4.1 years ago by biostarsb30
6
gravatar for Pierre Lindenbaum
4.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

I would like to know if there is a command line (with awk or bash) to extract that lines that exist in the one file and not in the two file

use comm : http://man7.org/linux/man-pages/man1/comm.1.html

comm -3 <(sort file1.txt)  <(sort file2.txt )
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Pierre Lindenbaum134k

i tested this but i have all genes not only those in the file1 me i need to extract only genes present in file1 and not in file2

ADD REPLYlink written 4.1 years ago by biostarsb30

only in 1st:

comm -23 <(sort file1.txt)  <(sort file2.txt )

only in 2nd

comm -13 <(sort file1.txt)  <(sort file2.txt )
ADD REPLYlink written 4.1 years ago by Pierre Lindenbaum134k
0
gravatar for Asaf
4.1 years ago by
Asaf8.5k
Israel
Asaf8.5k wrote:

Get all unique genes:

cat file1.tx file2.txt |sort |uniq -c |awk '$1==1'

Get genes in file 1 not in file2:

grep -w -f file2 -v file1
ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Asaf8.5k
1

use uniq -u instead of uniq -c |awk '$1==1'

ADD REPLYlink written 4.1 years ago by Pierre Lindenbaum134k
0
gravatar for shenwei356
4.1 years ago by
shenwei3565.8k
China
shenwei3565.8k wrote:

What not simply use grep?

grep -f file2.txt -v file1.txt
ADD COMMENTlink written 4.1 years ago by shenwei3565.8k
4
  • if 'gene2' is in file2.txt it will remove 'gene22' from file1.txt
  • in general , if file2.txt is big, you wouldn't want to put this in memory.
ADD REPLYlink written 4.1 years ago by Pierre Lindenbaum134k

You're right! Thanks

ADD REPLYlink written 4.1 years ago by shenwei3565.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2433 users visited in the last hour
_