Question: Two Sample t-test with bootstrapping for gene expression matrix in R
1
gravatar for Expert
13 days ago by
Expert10
Expert10 wrote:

I tried bioconductor packages for differential gene expression analysis such as EdgeR, Deseq2, Limma and obtained expressed genes by these methods. I want to compare my results with Two Sample t-test with bootstrapping, but I can not understand this method very well. For example in my table, there are 5 control, 5 treatment sample (columns) with 1000 genes (rows). Can we find differentially expressed genes by applying bootstrap two sample t-test for each gene? I scanned literature, but I could not find good solutions for gene expression analysis. I tried below codes as t-test process with "boot" package:

library(boot) boot.tee <- function(data, i){  data <- as.matrix(data)   for (i in 1:data) {     t.test(sample(data[i,1:5], 5, replace=T ),sample(data[i,6:10], 5, replace=T), paired = FALSE)$p.value  } } boot.out <- boot(data=LogT_matrix, statistic=boot.tee , R=10)

then I recieved a warning message :

In 1:data : numerical expression has 10000 elements: only the first used

In this page http://ww2.coastal.edu/kingw/statistics/R-tutorials/resample.html, there are some examples, but I want to obtain p values for all genes in my table such as toptable, toptags tables in EdgeR, limma packages. I can obtain standart t-test for my data, but I could not use it for bootstraping t-test. Can Bootstrap Statistics be applied to each gene? Thank you.

rna-seq R • 98 views
ADD COMMENTlink modified 12 days ago by e.rempel1000 • written 13 days ago by Expert10

I scanned literature, but I could not find good solutions for gene expression analysis.

Of course not, t-tests are not well-suited for gene expression (or any high-throughput) assays with limited replicate numbers as 5 vs 5, this is why these expert softwares such as DESeq2 and edgeR have been developed. Don't reinvent the wheel / waste your time, use them rather than doing homebrew methods. If t-tests with permutation was a valid option then the field would be applying these very obvious options and statisticians would not have spent their time developing alternatives.

ADD REPLYlink modified 12 days ago • written 13 days ago by ATpoint46k

Examples can be expanded. 10 controls, 30 treatments may also be available. I just gave a small example to run the code. Bootstrap t-test is a powerful test and the results obtained can be used for meta-analysis. The purpose of statistics is to develop appropriate methods.

ADD REPLYlink modified 13 days ago • written 13 days ago by Expert10

The t-test for continuous data, and RNA-seq data is discrete. I still am not sure why you want to torture the data so much, when most available software will appropriatly model the count data using the negative binomial distribution.

ADD REPLYlink modified 12 days ago • written 12 days ago by rpolicastro4.0k
1
gravatar for e.rempel
12 days ago by
e.rempel1000
Germany, Heidelberg
e.rempel1000 wrote:

Can Bootstrap Statistics be applied to each gene?

I think it can. I can imagine there is an error in your code. The error message implies that you should probably replace the expression

1:data

with

1:nrow(data)

since i is an iterator for rows(genes), right?

The point mentioned by ATpoint is an interesting one. You have to find a balance: use already existing/accepted code, but also try things out, compare them with existing tools, make your hands dirty.

ADD COMMENTlink written 12 days ago by e.rempel1000
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1292 users visited in the last hour
_