Question: Difference in means between sexes over time
0
gravatar for selplat21
12 days ago by
selplat2110
selplat2110 wrote:

Hello,

I have males and females across time for various phenotypes. I first began by binning my data in 20 year increments.

Data$cuts <- cut(Data$year, breaks = c(seq(min(Data$year), max(Data$year), 20), max(Data$year)), labels = FALSE)

This now produces a cut or bin with a value from 1-8 for every individual in my dataset.

I then am trying to produce an output with the difference in mean between males and females in a trait for each bin of time.

for (i in 1:8) {
  difmean <- c()
  Mcuts <- DataM[ which(DataM$cuts=='i'),]
  Fcuts <- DataF[ which(DataF$cuts=='i'),]
  Mmean <- mean(Mcuts$trait, na.rm = TRUE)
  Fmean <- mean(Fcuts$trait, na.rm = TRUE)
  difmean <- c(Mmean-Fmean)
  print (difmean)
}

I get an output of the following:

[1] NaN [1] NaN [1] NaN [1] NaN [1] NaN [1] NaN [1] NaN [1] NaN

Any help would be greatly appreciated!

R • 110 views
ADD COMMENTlink modified 11 days ago • written 12 days ago by selplat2110
1

Got it, you use 'i' instead of i in DataM$cuts=='i' and it's never the string 'i'

ADD REPLYlink written 12 days ago by Asaf8.1k

Thank you!! It is working now, much appreciated.

ADD REPLYlink written 12 days ago by selplat2110

Is there a way to assess significance of a linear model with binned data? I pasted some code below that generates the regression line, but I don't get p-values from the summary. Maybe I need to bootstrap and just look at confidence intervals?

ADD REPLYlink written 8 days ago by selplat2110
1

I think you should start a new thread for that question

ADD REPLYlink written 7 days ago by Asaf8.1k

Do Data and DataM and DataF have the same number of rows? Is trait a column in DataM and DataF?

ADD REPLYlink written 12 days ago by Asaf8.1k

DataM and DataF have a different numbers of rows, but the same columns. $trait is a column in both datasets.

DataM and DataF were generated like so:

DataM <- Data[which(Data$sex=="M"),]
DataF <- Data[which(Data$sex=="F"),]
ADD REPLYlink modified 12 days ago by RamRS27k • written 12 days ago by selplat2110

Side note: Why use which() when just specifying DataM<-Data[Data$sex=="M",] would work just fine?

ADD REPLYlink written 12 days ago by RamRS27k

You're right, it was just how I left it during processing.

ADD REPLYlink written 12 days ago by selplat2110
0
gravatar for selplat21
11 days ago by
selplat2110
selplat2110 wrote:

Update,

I was able to loop through and provide a mean difference, sample size for each sex, and total sample size.

Data$cuts <- cut(Data$year, breaks = c(seq(min(Data$year), max(Data$year), 20), max(Data$year)), labels = FALSE)

DataM <- Data[Data$sex=="M",]
DataF <- Data[Data$sex=="F",]

mean.df <- as.data.frame(c())

for (i in 2:8) {
  Mcuts <- DataM[which(DataM$cuts==i),]
  Fcuts <- DataF[which(DataF$cuts==i),]
  Mmean <- mean(Mcuts$trait, na.rm = TRUE)
  Fmean <- mean(Fcuts$trait, na.rm = TRUE)
  mean.df[i, "bin"] <- paste(i)
  mean.df[i, "mean_dif"] <- paste(Mmean-Fmean)
  mean.df[i, "ss_f"] <- paste(length(Mcuts$cuts))
  mean.df[i, "ss_m"] <- paste(length(Fcuts$cuts))
  mean.df[i, "ss_t"] <- paste(sum(length(Fcuts$cuts),length(Mcuts$cuts)))
  }

lm1 <- lm(mean_dif ~ bin, data=mean.df)
plot(mean.df$bin, mean.df$mean_dif)
abline(lm1)
summary(lm1)

Unfortunately, because this is binned data, the lm() command is unable to produce p-values. Is there a way to assess significance of the above trendline with binned data and account for the different sample sizes of bins?

ADD COMMENTlink written 11 days ago by selplat2110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1668 users visited in the last hour