Question: awk command to count specific field
0
gravatar for saadleeshehreen
4 weeks ago by
saadleeshehreen40 wrote:

Hi, I have a file with the following content. Now I want to count how many of them have 1 in the field -n2.

Bacteroides fragilis,0
Bacteroides fragilis,0
Salmonella enterica,1
Salmonella enterica,1
Salmonella enterica,1
Bacteroides fragilis,0
.................................
..................................

I used the following command :

cat f1.txt | awk {'$2 == 1'} | wc -l

But it doesn't give me the answer. Please help!

command • 234 views
ADD COMMENTlink modified 4 weeks ago by cpad01126.4k • written 4 weeks ago by saadleeshehreen40

I am not able to Understand input file format. You can use following command to count number of 1 in field 2

grep -o ",1" input.txt  | wc -l
ADD REPLYlink written 4 weeks ago by MSM5570
1
gravatar for Vijay Lakhujani
4 weeks ago by
Vijay Lakhujani2.5k
India
Vijay Lakhujani2.5k wrote:

The weird format of your file (if indeed it is in this way) is out of anyone's understanding. But I ll explain how awk could work here provided a nicely formatted tab separated table

Consider your file (say file.txt) this way, the <tab> and <space> symbols are for representation, your actual file will have whitepspace (tabs and spaces) and corresponding positions shown in the file

Bacteroides<space>fragilis<tab>0
Bacteroides<space>fragilis<tab>0
Salmonella<space>enterica<tab>1
Salmonella<space>enterica<tab>1
Salmonella<space>enterica<tab>1
Bacteroides<space>fragilis<tab>0

Now if you say

awk '$2==1{print}' file.txt | wc -l

It may not work, because, by default the field separator which awk consider here is the first white space it encounters which in this case would be the space Bacteroides <space> fragilis

Hence, you must add a field separator -F

awk -F "\t" '$2==1{print}' file.txt | wc -l
ADD COMMENTlink written 4 weeks ago by Vijay Lakhujani2.5k
awk -F "," '$2==1' file.txt | wc -l
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Friederike1.8k
1
gravatar for Pierre Lindenbaum
4 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum108k wrote:

set the field separator 'F' to 'comma' and increase a value 'N' each time column 2 is '1'. At the end print the value of N.

awk -F, '($2==1){N++;}END{print N;}' file.txt

but I think most people would use

cut -d, -f 2 file.txt | grep -c -w 1
ADD COMMENTlink written 4 weeks ago by Pierre Lindenbaum108k

Hi, Thanks. I have other related problem. My file like this:

10 Lachnoclostridium sp.   0       0       0       0       1
11 Haemophilus ducreyi     0       0       0       0       1
12 Clostridiales bacterium 0       0       0       0       1
13 Escherichia albertii    0       1       0       0       1

It has 8 fields. I want to just count the lines which value =1 in field 7 and field 8. How can I do that? I used the following, but it's not the exact output.

awk '$4 == 0; $5 == 0; $6 == 0; $7 == 1; $8 ==1' file.txt

ADD REPLYlink modified 4 weeks ago by Pierre Lindenbaum108k • written 4 weeks ago by saadleeshehreen40

Hi, You can use following command

awk  '$7==1 && $8==1 {print}' input.txt
ADD REPLYlink written 4 weeks ago by MSM5570

or just awk '$7==1 && $8==1' input.txt

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Pierre Lindenbaum108k

Thanks. It works for me. :)

ADD REPLYlink written 4 weeks ago by saadleeshehreen40

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum108k

try this as well:

awk '$7 && $8==1'  input.txt
ADD REPLYlink written 4 weeks ago by cpad01126.4k
0
gravatar for cpad0112
4 weeks ago by
cpad01126.4k
cpad01126.4k wrote:

Though OP wants solution in awk, here is datamash solution:

Species (organism) wide 0's and 1's count:

$ datamash -s -t "," -g 1,2 count 2 < test.txt | sed 's/,/\t/g'
Bacteroides fragilis    0   3
Salmonella enterica 1   3

Only 0's and 1's count:

$ datamash -s -t "," -g 2 count 2 < test.txt | sed 's/,/\t/g'
0   3
1   3

input (from OP):

 $ cat test.txt 
Bacteroides fragilis,0
Bacteroides fragilis,0
Salmonella enterica,1
Salmonella enterica,1
Salmonella enterica,1
Bacteroides fragilis,0
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by cpad01126.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 711 users visited in the last hour