t-test in two groups, multiple rows
2
0
Entering edit mode
6 months ago
sooni ▴ 20

Hello.

My data frame has different rows of bacterial genes and a total of 6 columns, 3 for the control group and 3 for the experimental group. I want to do a t-test between the control group and the experimental group, and for each row, i.e. for each bacteria, I want to get the p-value between the two groups.

Here's the R code I used to do this:

col_t_test <- function(col) {
  WT <- col(kegg_counts[1:3])    
  PK <- col(kegg_counts[4:6])  

  t_test_result <- t.test(WT, PK)
  return(c(t_test_result$estimate, t_test_result$p.value))
}

results <- t(apply(kegg_counts, 1, col_t_test))

If you run the above code, all result values will be the same. Something seems wrong. Is there a good way?

Thank you for help!

R t-test • 1.1k views
ADD COMMENT
0
Entering edit mode

A solution would be to use t_test function from the rstatixpackage, it provides an easy solution to your problem.

ADD REPLY
0
Entering edit mode

A t-test isn't appropriate for count data.

ADD REPLY
0
Entering edit mode

I understand that you have count data (interger) with 3 replicates per group. It is important to understand where these counts are coming from to devise a good testing strategy. Please note that the accepted answer in this case would be incorrect.

ADD REPLY
2
Entering edit mode
6 months ago
ATpoint 84k

Simplest case I can think of:

ncol <- 6
nrow <- 10000

m <- matrix(data = rnorm(ncol*nrow), nrow = nrow, ncol = ncol)

res <- lapply(1:nrow(m), function(i){

  a <- m[i, 1:3, drop = TRUE]
  b <- m[i, 4:6, drop = TRUE]
  tt <- t.test(a, b)
  d <- data.frame(pvalue = tt$p.value, t = tt$statistic)
  return(d)

})

do.call(rbind, res)

Not efficient, but still runs in < 1 second on 10000 rows, so good enough without any fancy packages.

ADD COMMENT
0
Entering edit mode

The following error occurs:

Error in var(x) : is.atomic(x) is not TRUE
In addition: Warning message:
In mean.default(x) : Returns NA because the argument is not a numeric or logical type.

First of all, my original data frames are all in numeric form.

ADD REPLY
0
Entering edit mode

Posted code works for numeric matrix. Sanitize your data.

ADD REPLY
0
Entering edit mode

Agree on @dariober, make sure existing expert software cannot do this much better. (limma)

ADD REPLY
0
Entering edit mode

I think the OP has count data from bacterial gene counts annotated KEGG categories, as I infer from the orignal post. Student's T-test is not appropriate to analyze these data. It will of course deliver a p-value, but a meaningless one. Therefore, I think this solution is not correct.

ADD REPLY
0
Entering edit mode

It was my mistake. I converted the dataframe to matrix. It works! Thank you!

ADD REPLY
1
Entering edit mode
6 months ago

Perhaps the machinery for differential gene expression analysis (i.e. limma, edger, deseq) is what you are looking for. Regarding your code, you pass col as an argument but you use the col function on (possibly) a vector. Maybe you wanted something like this:

col_t_test <- function(col) {
  WT <- col[1:3]
  PK <- col[4:6]

  t_test_result <- t.test(WT, PK)
  return(c(t_test_result$estimate, t_test_result$p.value))
}

Also, you probably want to apply some sort of multiple testing correction to the resulting p-values.

Finally, nit-picking:

I want to do a t-test between the control group and the experimental group, and ... I want to get the p-value between the two groups.

I think it is better to think in terms of what you want to estimate and only then choose an appropriate statistics. In your case, you (probably) want to estimate the difference between groups and assess to what extent that difference is compatible with the hypothesis of no difference. For this a t-test seems reasonable but there may be better options.

ADD COMMENT

Login before adding your answer.

Traffic: 1046 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6