Question

t-test in two groups, multiple rows

0

Entering edit mode

6 weeks ago

sooni ▴ 20

Hello.

My data frame has different rows of bacterial genes and a total of 6 columns, 3 for the control group and 3 for the experimental group. I want to do a t-test between the control group and the experimental group, and for each row, i.e. for each bacteria, I want to get the p-value between the two groups.

Here's the R code I used to do this:

col_t_test <- function(col) {
  WT <- col(kegg_counts[1:3])    
  PK <- col(kegg_counts[4:6])  

  t_test_result <- t.test(WT, PK)
  return(c(t_test_result$estimate, t_test_result$p.value))
}

results <- t(apply(kegg_counts, 1, col_t_test))

If you run the above code, all result values will be the same. Something seems wrong. Is there a good way?

Thank you for help!

R t-test • 786 views

ADD COMMENT • link 5 weeks ago by sooni ▴ 20

0

Entering edit mode

A solution would be to use t_test function from the rstatixpackage, it provides an easy solution to your problem.

ADD REPLY • link 6 weeks ago by DBScan ▴ 300

0

Entering edit mode

A t-test isn't appropriate for count data.

ADD REPLY • link 6 weeks ago by Michael 54k

0

Entering edit mode

I understand that you have count data (interger) with 3 replicates per group. It is important to understand where these counts are coming from to devise a good testing strategy. Please note that the accepted answer in this case would be incorrect.

ADD REPLY • link 5 weeks ago by Michael 54k

1

Entering edit mode

5 weeks ago

dariober 14k

Perhaps the machinery for differential gene expression analysis (i.e. limma, edger, deseq) is what you are looking for. Regarding your code, you pass col as an argument but you use the col function on (possibly) a vector. Maybe you wanted something like this:

col_t_test <- function(col) {
  WT <- col[1:3]
  PK <- col[4:6]

  t_test_result <- t.test(WT, PK)
  return(c(t_test_result$estimate, t_test_result$p.value))
}

Also, you probably want to apply some sort of multiple testing correction to the resulting p-values.

Finally, nit-picking:

I want to do a t-test between the control group and the experimental group, and ... I want to get the p-value between the two groups.

I think it is better to think in terms of what you want to estimate and only then choose an appropriate statistics. In your case, you (probably) want to estimate the difference between groups and assess to what extent that difference is compatible with the hypothesis of no difference. For this a t-test seems reasonable but there may be better options.

ADD COMMENT • link 5 weeks ago by dariober 14k

score 2 · Accepted Answer · 2024-03-15

2

Entering edit mode

6 weeks ago

ATpoint 82k

Simplest case I can think of:

ncol <- 6
nrow <- 10000

m <- matrix(data = rnorm(ncol*nrow), nrow = nrow, ncol = ncol)

res <- lapply(1:nrow(m), function(i){

  a <- m[i, 1:3, drop = TRUE]
  b <- m[i, 4:6, drop = TRUE]
  tt <- t.test(a, b)
  d <- data.frame(pvalue = tt$p.value, t = tt$statistic)
  return(d)

})

do.call(rbind, res)

Not efficient, but still runs in < 1 second on 10000 rows, so good enough without any fancy packages.

ADD COMMENT • link 6 weeks ago by ATpoint 82k

0

Entering edit mode

The following error occurs:

Error in var(x) : is.atomic(x) is not TRUE
In addition: Warning message:
In mean.default(x) : Returns NA because the argument is not a numeric or logical type.

First of all, my original data frames are all in numeric form.

ADD REPLY • link 5 weeks ago by sooni ▴ 20

0

Entering edit mode

Posted code works for numeric matrix. Sanitize your data.

ADD REPLY • link 5 weeks ago by ATpoint 82k

0

Entering edit mode

Agree on @dariober, make sure existing expert software cannot do this much better. (limma)

ADD REPLY • link 5 weeks ago by ATpoint 82k

0

Entering edit mode

I think the OP has count data from bacterial gene counts annotated KEGG categories, as I infer from the orignal post. Student's T-test is not appropriate to analyze these data. It will of course deliver a p-value, but a meaningless one. Therefore, I think this solution is not correct.