Question: Significance testing on regression with binned data

0

selplat21 •

**10**wrote:I have binned the following data by year, but am wondering how I can assess significance of the resulting regression because the data are binned. I would like to do this regression for multiple traits separately against time binned in years.

```
Data$cuts <- cut(Data$year, breaks = c(seq(min(Data$year), max(Data$year), 20), max(Data$year)), labels = FALSE)
DataM <- Data[Data$sex=="M",]
DataF <- Data[Data$sex=="F",]
mean.df <- as.data.frame(c())
for (i in 2:8) {
Mcuts <- DataM[which(DataM$cuts==i),]
Fcuts <- DataF[which(DataF$cuts==i),]
Mmean <- mean(Mcuts$trait, na.rm = TRUE)
Fmean <- mean(Fcuts$trait, na.rm = TRUE)
mean.df[i, "bin"] <- paste(i)
mean.df[i, "mean_dif"] <- paste(Mmean-Fmean)
mean.df[i, "ss_f"] <- paste(length(Mcuts$cuts))
mean.df[i, "ss_m"] <- paste(length(Fcuts$cuts))
mean.df[i, "ss_t"] <- paste(sum(length(Fcuts$cuts),length(Mcuts$cuts)))
}
lm1 <- lm(mean_dif ~ bin, data=mean.df)
plot(mean.df$bin, mean.df$mean_dif)
abline(lm1)
summary(lm1)
```

I don't really understand why you go through all of this process. couldn't

`lm( trait ~ year + sex + year:sex )`

tell you what you want? Anyway since you're taking the means and subtract them for each bin you end up with 7 values for 7 different levels,`bin`

is not an integer.8.0kYes, i've done this and yes it does tell me that there's an effect, but I am testing additional hypotheses following up on this effect.

I was able to fix with the following, which provides a p-value:

However, is this p-value usable since the data are binned? Additionally, some bins have few to no data, do I exclude these or do I need confidence intervals, etc.?

10