Question: awk command to count specific field
0
gravatar for saadleeshehreen
17 months ago by
saadleeshehreen70 wrote:

Hi, I have a file with the following content. Now I want to count how many of them have 1 in the field -n2.

Bacteroides fragilis,0
Bacteroides fragilis,0
Salmonella enterica,1
Salmonella enterica,1
Salmonella enterica,1
Bacteroides fragilis,0
.................................
..................................

I used the following command :

cat f1.txt | awk {'$2 == 1'} | wc -l

But it doesn't give me the answer. Please help!

command • 686 views
ADD COMMENTlink modified 17 months ago by cpad011212k • written 17 months ago by saadleeshehreen70

I am not able to Understand input file format. You can use following command to count number of 1 in field 2

grep -o ",1" input.txt  | wc -l
ADD REPLYlink written 17 months ago by MSM55120
1
gravatar for lakhujanivijay
17 months ago by
lakhujanivijay4.5k
India
lakhujanivijay4.5k wrote:

The weird format of your file (if indeed it is in this way) is out of anyone's understanding. But I ll explain how awk could work here provided a nicely formatted tab separated table

Consider your file (say file.txt) this way, the <tab> and <space> symbols are for representation, your actual file will have whitepspace (tabs and spaces) and corresponding positions shown in the file

Bacteroides<space>fragilis<tab>0
Bacteroides<space>fragilis<tab>0
Salmonella<space>enterica<tab>1
Salmonella<space>enterica<tab>1
Salmonella<space>enterica<tab>1
Bacteroides<space>fragilis<tab>0

Now if you say

awk '$2==1{print}' file.txt | wc -l

It may not work, because, by default the field separator which awk consider here is the first white space it encounters which in this case would be the space Bacteroides <space> fragilis

Hence, you must add a field separator -F

awk -F "\t" '$2==1{print}' file.txt | wc -l
ADD COMMENTlink written 17 months ago by lakhujanivijay4.5k
awk -F "," '$2==1' file.txt | wc -l
ADD REPLYlink modified 17 months ago • written 17 months ago by Friederike5.2k
1
gravatar for Pierre Lindenbaum
17 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

set the field separator 'F' to 'comma' and increase a value 'N' each time column 2 is '1'. At the end print the value of N.

awk -F, '($2==1){N++;}END{print N;}' file.txt

but I think most people would use

cut -d, -f 2 file.txt | grep -c -w 1
ADD COMMENTlink written 17 months ago by Pierre Lindenbaum123k

Hi, Thanks. I have other related problem. My file like this:

10 Lachnoclostridium sp.   0       0       0       0       1
11 Haemophilus ducreyi     0       0       0       0       1
12 Clostridiales bacterium 0       0       0       0       1
13 Escherichia albertii    0       1       0       0       1

It has 8 fields. I want to just count the lines which value =1 in field 7 and field 8. How can I do that? I used the following, but it's not the exact output.

awk '$4 == 0; $5 == 0; $6 == 0; $7 == 1; $8 ==1' file.txt

ADD REPLYlink modified 17 months ago by Pierre Lindenbaum123k • written 17 months ago by saadleeshehreen70

Hi, You can use following command

awk  '$7==1 && $8==1 {print}' input.txt
ADD REPLYlink written 17 months ago by MSM55120

or just awk '$7==1 && $8==1' input.txt

ADD REPLYlink modified 17 months ago • written 17 months ago by Pierre Lindenbaum123k

Thanks. It works for me. :)

ADD REPLYlink written 17 months ago by saadleeshehreen70

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLYlink written 17 months ago by Pierre Lindenbaum123k

try this as well:

awk '$7 && $8==1'  input.txt
ADD REPLYlink written 17 months ago by cpad011212k
0
gravatar for cpad0112
17 months ago by
cpad011212k
India
cpad011212k wrote:

Though OP wants solution in awk, here is datamash solution:

Species (organism) wide 0's and 1's count:

$ datamash -s -t "," -g 1,2 count 2 < test.txt | sed 's/,/\t/g'
Bacteroides fragilis    0   3
Salmonella enterica 1   3

Only 0's and 1's count:

$ datamash -s -t "," -g 2 count 2 < test.txt | sed 's/,/\t/g'
0   3
1   3

input (from OP):

 $ cat test.txt 
Bacteroides fragilis,0
Bacteroides fragilis,0
Salmonella enterica,1
Salmonella enterica,1
Salmonella enterica,1
Bacteroides fragilis,0
ADD COMMENTlink modified 17 months ago • written 17 months ago by cpad011212k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1905 users visited in the last hour