summing taxon counts per participant
2
0
Entering edit mode
11 months ago

I am trying to sum taxa per participant in this partial of a dataset. In the first column there are repeating taxa, the other columns are participants and counts of the taxa. When I had only one person, I used the =SUMIF($A$2:$A$15, E2, $B$2:$B$15) function after removing duplicates in excel which did the trick. I am trying to avoid having to do it 150 times for each participant. Would anyone have an idea, either in Excel or R?

Taxon                  A              B          C                 D
Bacteroides    193,568     30,950     114,136  390,546
Prevotella     0               364,092    0                  0
Bacteroides    352,626     389,938 8,492   213,736
Fusicatenibacter   669,374 402,290 317,594 178,340
Bacteroides    563,284      14,114 832,400 30,958
Bacteroides        0                     0           0          78,302
Agathobacter     204,810       117,144 180,104 69,296
Alistipes               0                136,800     0            8,430
Blautia              161,132        261,660      9,092 118,602
Ruminococcus  75,010             0           282   26,368
Bacteroides       50,782         23,066    61,236  37,214
Prevotella    91,560              0             19,676 69,752
Blautia              125,734              0            26,750  93,874
Alistipes        139,518         51,154    87,490   5,266
Agathobacter      4,854          13,830    34,772   1,782

R excel • 663 views
0
Entering edit mode

How do you know which duplicate taxon to remove?

0
Entering edit mode

I remove all duplicates, then sum counts with the SUMIF function in excel. Works for one participant/column and I am looking for a formula which does it for all columns.

1
Entering edit mode
11 months ago
zx8754 11k

I think you want to group by Taxon then sum all columns.

Excel, use pivot:

R, see this SO post: - Aggregate / summarize multiple variables per group (e.g. sum, mean)

0
Entering edit mode

Fantastic, thank you so much.

1
Entering edit mode
11 months ago

With R you can use the function aggregate

aggregate(. ~ Taxon, df, sum) # df is the input data.frame

0
Entering edit mode

Love it, thank you very much.