Question: awk command to count specific field
0
gravatar for saadleeshehreen
4 months ago by
saadleeshehreen40 wrote:

Hi, I have a file with the following content. Now I want to count how many of them have 1 in the field -n2.

Bacteroides fragilis,0
Bacteroides fragilis,0
Salmonella enterica,1
Salmonella enterica,1
Salmonella enterica,1
Bacteroides fragilis,0
.................................
..................................

I used the following command :

cat f1.txt | awk {'$2 == 1'} | wc -l

But it doesn't give me the answer. Please help!

command • 298 views
ADD COMMENTlink modified 4 months ago by cpad01129.0k • written 4 months ago by saadleeshehreen40

I am not able to Understand input file format. You can use following command to count number of 1 in field 2

grep -o ",1" input.txt  | wc -l
ADD REPLYlink written 4 months ago by MSM5570
1
gravatar for Vijay Lakhujani
4 months ago by
Vijay Lakhujani3.0k
India
Vijay Lakhujani3.0k wrote:

The weird format of your file (if indeed it is in this way) is out of anyone's understanding. But I ll explain how awk could work here provided a nicely formatted tab separated table

Consider your file (say file.txt) this way, the <tab> and <space> symbols are for representation, your actual file will have whitepspace (tabs and spaces) and corresponding positions shown in the file

Bacteroides<space>fragilis<tab>0
Bacteroides<space>fragilis<tab>0
Salmonella<space>enterica<tab>1
Salmonella<space>enterica<tab>1
Salmonella<space>enterica<tab>1
Bacteroides<space>fragilis<tab>0

Now if you say

awk '$2==1{print}' file.txt | wc -l

It may not work, because, by default the field separator which awk consider here is the first white space it encounters which in this case would be the space Bacteroides <space> fragilis

Hence, you must add a field separator -F

awk -F "\t" '$2==1{print}' file.txt | wc -l
ADD COMMENTlink written 4 months ago by Vijay Lakhujani3.0k
awk -F "," '$2==1' file.txt | wc -l
ADD REPLYlink modified 4 months ago • written 4 months ago by Friederike2.1k
1
gravatar for Pierre Lindenbaum
4 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

set the field separator 'F' to 'comma' and increase a value 'N' each time column 2 is '1'. At the end print the value of N.

awk -F, '($2==1){N++;}END{print N;}' file.txt

but I think most people would use

cut -d, -f 2 file.txt | grep -c -w 1
ADD COMMENTlink written 4 months ago by Pierre Lindenbaum112k

Hi, Thanks. I have other related problem. My file like this:

10 Lachnoclostridium sp.   0       0       0       0       1
11 Haemophilus ducreyi     0       0       0       0       1
12 Clostridiales bacterium 0       0       0       0       1
13 Escherichia albertii    0       1       0       0       1

It has 8 fields. I want to just count the lines which value =1 in field 7 and field 8. How can I do that? I used the following, but it's not the exact output.

awk '$4 == 0; $5 == 0; $6 == 0; $7 == 1; $8 ==1' file.txt

ADD REPLYlink modified 4 months ago by Pierre Lindenbaum112k • written 4 months ago by saadleeshehreen40

Hi, You can use following command

awk  '$7==1 && $8==1 {print}' input.txt
ADD REPLYlink written 4 months ago by MSM5570

or just awk '$7==1 && $8==1' input.txt

ADD REPLYlink modified 4 months ago • written 4 months ago by Pierre Lindenbaum112k

Thanks. It works for me. :)

ADD REPLYlink written 4 months ago by saadleeshehreen40

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLYlink written 4 months ago by Pierre Lindenbaum112k

try this as well:

awk '$7 && $8==1'  input.txt
ADD REPLYlink written 4 months ago by cpad01129.0k
0
gravatar for cpad0112
4 months ago by
cpad01129.0k
India
cpad01129.0k wrote:

Though OP wants solution in awk, here is datamash solution:

Species (organism) wide 0's and 1's count:

$ datamash -s -t "," -g 1,2 count 2 < test.txt | sed 's/,/\t/g'
Bacteroides fragilis    0   3
Salmonella enterica 1   3

Only 0's and 1's count:

$ datamash -s -t "," -g 2 count 2 < test.txt | sed 's/,/\t/g'
0   3
1   3

input (from OP):

 $ cat test.txt 
Bacteroides fragilis,0
Bacteroides fragilis,0
Salmonella enterica,1
Salmonella enterica,1
Salmonella enterica,1
Bacteroides fragilis,0
ADD COMMENTlink modified 4 months ago • written 4 months ago by cpad01129.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1026 users visited in the last hour