t-test in two groups, multiple rows
2
0
Entering edit mode
3 months ago
sooni ▴ 20

Hello.

My data frame has different rows of bacterial genes and a total of 6 columns, 3 for the control group and 3 for the experimental group. I want to do a t-test between the control group and the experimental group, and for each row, i.e. for each bacteria, I want to get the p-value between the two groups.

Here's the R code I used to do this:

col_t_test <- function(col) {
WT <- col(kegg_counts[1:3])
PK <- col(kegg_counts[4:6])

t_test_result <- t.test(WT, PK)
return(c(t_test_result$estimate, t_test_result$p.value))
}

results <- t(apply(kegg_counts, 1, col_t_test))


If you run the above code, all result values will be the same. Something seems wrong. Is there a good way?

Thank you for help!

R t-test • 921 views
0
Entering edit mode

A solution would be to use t_test function from the rstatixpackage, it provides an easy solution to your problem.

0
Entering edit mode

A t-test isn't appropriate for count data.

0
Entering edit mode

I understand that you have count data (interger) with 3 replicates per group. It is important to understand where these counts are coming from to devise a good testing strategy. Please note that the accepted answer in this case would be incorrect.

2
Entering edit mode
3 months ago
ATpoint 83k

Simplest case I can think of:

ncol <- 6
nrow <- 10000

m <- matrix(data = rnorm(ncol*nrow), nrow = nrow, ncol = ncol)

res <- lapply(1:nrow(m), function(i){

a <- m[i, 1:3, drop = TRUE]
b <- m[i, 4:6, drop = TRUE]
tt <- t.test(a, b)
d <- data.frame(pvalue = tt$p.value, t = tt$statistic)
return(d)

})

do.call(rbind, res)


Not efficient, but still runs in < 1 second on 10000 rows, so good enough without any fancy packages.

0
Entering edit mode

The following error occurs:

Error in var(x) : is.atomic(x) is not TRUE
In mean.default(x) : Returns NA because the argument is not a numeric or logical type.


First of all, my original data frames are all in numeric form.

0
Entering edit mode

Posted code works for numeric matrix. Sanitize your data.

0
Entering edit mode

Agree on @dariober, make sure existing expert software cannot do this much better. (limma)

0
Entering edit mode

I think the OP has count data from bacterial gene counts annotated KEGG categories, as I infer from the orignal post. Student's T-test is not appropriate to analyze these data. It will of course deliver a p-value, but a meaningless one. Therefore, I think this solution is not correct.

0
Entering edit mode

It was my mistake. I converted the dataframe to matrix. It works! Thank you!

1
Entering edit mode
3 months ago

Perhaps the machinery for differential gene expression analysis (i.e. limma, edger, deseq) is what you are looking for. Regarding your code, you pass col as an argument but you use the col function on (possibly) a vector. Maybe you wanted something like this:

col_t_test <- function(col) {
WT <- col[1:3]
PK <- col[4:6]

t_test_result <- t.test(WT, PK)
return(c(t_test_result$estimate, t_test_result$p.value))
}


Also, you probably want to apply some sort of multiple testing correction to the resulting p-values.

Finally, nit-picking:

I want to do a t-test between the control group and the experimental group, and ... I want to get the p-value between the two groups.

I think it is better to think in terms of what you want to estimate and only then choose an appropriate statistics. In your case, you (probably) want to estimate the difference between groups and assess to what extent that difference is compatible with the hypothesis of no difference. For this a t-test seems reasonable but there may be better options.