Question: Significance testing on regression with binned data
0
selplat2110 wrote:

I have binned the following data by year, but am wondering how I can assess significance of the resulting regression because the data are binned. I would like to do this regression for multiple traits separately against time binned in years.

``````Data\$cuts <- cut(Data\$year, breaks = c(seq(min(Data\$year), max(Data\$year), 20), max(Data\$year)), labels = FALSE)

DataM <- Data[Data\$sex=="M",]
DataF <- Data[Data\$sex=="F",]

mean.df <- as.data.frame(c())

for (i in 2:8) {
Mcuts <- DataM[which(DataM\$cuts==i),]
Fcuts <- DataF[which(DataF\$cuts==i),]
Mmean <- mean(Mcuts\$trait, na.rm = TRUE)
Fmean <- mean(Fcuts\$trait, na.rm = TRUE)
mean.df[i, "bin"] <- paste(i)
mean.df[i, "mean_dif"] <- paste(Mmean-Fmean)
mean.df[i, "ss_f"] <- paste(length(Mcuts\$cuts))
mean.df[i, "ss_m"] <- paste(length(Fcuts\$cuts))
mean.df[i, "ss_t"] <- paste(sum(length(Fcuts\$cuts),length(Mcuts\$cuts)))
}

lm1 <- lm(mean_dif ~ bin, data=mean.df)
plot(mean.df\$bin, mean.df\$mean_dif)
abline(lm1)
summary(lm1)
``````
R • 49 views
written 2 days ago by selplat2110
1

I don't really understand why you go through all of this process. couldn't `lm( trait ~ year + sex + year:sex )` tell you what you want? Anyway since you're taking the means and subtract them for each bin you end up with 7 values for 7 different levels, `bin` is not an integer.

Yes, i've done this and yes it does tell me that there's an effect, but I am testing additional hypotheses following up on this effect.

I was able to fix with the following, which provides a p-value:

``````lm1 <- lm(mean_dif ~ as.numeric(mean.df\$bin), data=mean.df)
plot(mean.df\$bin, mean.df\$mean_dif)
abline(lm1)
summary(lm1)
``````

However, is this p-value usable since the data are binned? Additionally, some bins have few to no data, do I exclude these or do I need confidence intervals, etc.?