Question: Finding The Mean Of Values In A Single Column
0
gravatar for robjohn7000
5.5 years ago by
robjohn700070
United Kingdom
robjohn700070 wrote:

I have a data frame (process.yield):

process    Yield
35        0.38
37        0.29
89        0.75
90        0.82

I want R to calculate the mean of values in column 2 ("Yield"). This seems trivial, but somehow after applying functions like apply, aggregate, mean, by, I have not been able to get the right results. I'm guessing there is a problem with my data frame.

Example: Aggregate function:

process.yield.mean <- aggregate(process.yield, by=list(process.yield$Yield), FUN=mean)

Error from aggregate function:

1: In mean.default(X[[1L]], ...) :
 argument is not numeric or logical: returning NA 
  etc etc

Can anyone help please?

R • 32k views
ADD COMMENTlink modified 5.5 years ago by Michael Dondrup46k • written 5.5 years ago by robjohn700070

The error message is telling you that the thing you are passing to mean() isn't a numeric or a logical vector. Use class() to work out what your column is (most likely a character vector?) and convert it. If you are reading this data in from .csv you may find some of the entries in that column are not, in fact, numbers?

ADD REPLYlink written 5.5 years ago by David W4.7k
2
gravatar for Devon Ryan
5.5 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

How about just mean(process.yield$Yield)? That would seem rather simpler.

ADD COMMENTlink written 5.5 years ago by Devon Ryan89k

mean(process.yield$Yield) gave this error: [1] NA Warning message: In mean.default(process.yield$Yield) : argument is not numeric or logical: returning NA

ADD REPLYlink written 5.5 years ago by robjohn700070
1

What about mean(as.numeric(process.yield$Yield))?

ADD REPLYlink written 5.5 years ago by Mitch Bekritsky1.1k
1

It's actually probably a factor, in which as.numeric(levels(x))[x] is the way to go. As per ?factor. In any case, the correct diagnosis for the problem is in the error message...

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by David W4.7k

That was exceptional David. as.numeric(levels(x))[x] did it. Why do you think "factor" was introduced, since all I did was to to use cbind() to combine the 2 columns in the data frame.

ADD REPLYlink written 5.5 years ago by robjohn700070
1

It's a bit complex, but cbind() and rbind() return matrices, which can only contain one data-type and will convert numerics to characters if there are some in the things that are being bound. as.data.frame() converts character vectors to factors by default. If you have mixed types t's usually best to use data.frame(x=my_numeric, y = my char, z=my_factors)

ADD REPLYlink written 5.5 years ago by David W4.7k

Thanks again David.

ADD REPLYlink written 5.5 years ago by robjohn700070

Then as David W. suggested above, those are probably characters, not numbers. Try converting with as.numeric().

ADD REPLYlink written 5.5 years ago by Devon Ryan89k
1
gravatar for always_learning
5.5 years ago by
Doha, Qatar
always_learning960 wrote:

summary (process.yield) command will also give mean.

ADD COMMENTlink written 5.5 years ago by always_learning960

Thanks all for all your comments. mean() and summary() should have worked, but so far this has not happened, and I'm suspecting the way I put together the data frame in the first place. Process.Yield frame was obtained by combining Process and Yield columns using cbind(). mean() worked fine with Yield column before being combined with Process column.

ADD REPLYlink written 5.5 years ago by robjohn700070
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 788 users visited in the last hour