Question: Finding The Mean Of Values In A Single Column
0
gravatar for robjohn7000
6.9 years ago by
robjohn7000100
United Kingdom
robjohn7000100 wrote:

I have a data frame (process.yield):

process    Yield
35        0.38
37        0.29
89        0.75
90        0.82

I want R to calculate the mean of values in column 2 ("Yield"). This seems trivial, but somehow after applying functions like apply, aggregate, mean, by, I have not been able to get the right results. I'm guessing there is a problem with my data frame.

Example: Aggregate function:

process.yield.mean <- aggregate(process.yield, by=list(process.yield$Yield), FUN=mean)

Error from aggregate function:

1: In mean.default(X[[1L]], ...) :
 argument is not numeric or logical: returning NA 
  etc etc

Can anyone help please?

R • 37k views
ADD COMMENTlink modified 6.9 years ago by Michael Dondrup47k • written 6.9 years ago by robjohn7000100

The error message is telling you that the thing you are passing to mean() isn't a numeric or a logical vector. Use class() to work out what your column is (most likely a character vector?) and convert it. If you are reading this data in from .csv you may find some of the entries in that column are not, in fact, numbers?

ADD REPLYlink written 6.9 years ago by David W4.8k
2
gravatar for Devon Ryan
6.9 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

How about just mean(process.yield$Yield)? That would seem rather simpler.

ADD COMMENTlink written 6.9 years ago by Devon Ryan96k

mean(process.yield$Yield) gave this error: [1] NA Warning message: In mean.default(process.yield$Yield) : argument is not numeric or logical: returning NA

ADD REPLYlink written 6.9 years ago by robjohn7000100
1

What about mean(as.numeric(process.yield$Yield))?

ADD REPLYlink written 6.9 years ago by Mitch Bekritsky1.2k
1

It's actually probably a factor, in which as.numeric(levels(x))[x] is the way to go. As per ?factor. In any case, the correct diagnosis for the problem is in the error message...

ADD REPLYlink modified 6.9 years ago • written 6.9 years ago by David W4.8k

That was exceptional David. as.numeric(levels(x))[x] did it. Why do you think "factor" was introduced, since all I did was to to use cbind() to combine the 2 columns in the data frame.

ADD REPLYlink written 6.9 years ago by robjohn7000100
1

It's a bit complex, but cbind() and rbind() return matrices, which can only contain one data-type and will convert numerics to characters if there are some in the things that are being bound. as.data.frame() converts character vectors to factors by default. If you have mixed types t's usually best to use data.frame(x=my_numeric, y = my char, z=my_factors)

ADD REPLYlink written 6.9 years ago by David W4.8k

Thanks again David.

ADD REPLYlink written 6.9 years ago by robjohn7000100

Then as David W. suggested above, those are probably characters, not numbers. Try converting with as.numeric().

ADD REPLYlink written 6.9 years ago by Devon Ryan96k
1
gravatar for always_learning
6.9 years ago by
always_learning1.0k
Doha, Qatar
always_learning1.0k wrote:

summary (process.yield) command will also give mean.

ADD COMMENTlink written 6.9 years ago by always_learning1.0k

Thanks all for all your comments. mean() and summary() should have worked, but so far this has not happened, and I'm suspecting the way I put together the data frame in the first place. Process.Yield frame was obtained by combining Process and Yield columns using cbind(). mean() worked fine with Yield column before being combined with Process column.

ADD REPLYlink written 6.9 years ago by robjohn7000100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1976 users visited in the last hour