Code for multiclass differential gene expression analysis
0
0
Entering edit mode
2.4 years ago
orzech_mag ▴ 230

Dear Collegues,

I am trying to perform analysis of differentially expressed genes between multiple groups. For that I found some tutorial (https://jef.works/blog/2017/05/31/multiclass-diffential-expression-analysis/). I am interested in t-test only between one group vs rest. This is corresponding code part:

pv.sig <- lapply(levels(groups), function(g){
ingroup <- names(groups)[groups %in% g]
outgroup <- names(groups)[!(groups %in% g)]
pv <- sapply(1:M, function(i) {
t.test(x[i,ingroup], x[i,outgroup])$p.value }) names(pv) <- rownames(x) pv.sig <- names(pv)[pv < 0.05/M/G] ## bonferonni pv.sig })  When I run it on data created in the tutorial everything is fine. When I run it on my data each time I get such error: Error in 'x[i, ingroup]': subscript out of bounds  This is how my data looks like: my data My example file may be downloaded here: example data This is how my code looks like: x = read.table("~/Desktop/DAT.txt", sep = "\t", dec = ".", header = T) groups = c(rep("B", 505), rep("C", 304), rep("O", 301), rep("U", 370)) groups = as.factor(groups) rownames(x) = x$X
x$X = NULL x = as.matrix(x) names(groups) <- colnames(x) M = 1480 G = 4 pv.sig <- lapply(levels(groups), function(g){ ingroup <- names(groups)[groups %in% g] outgroup <- names(groups)[!(groups %in% g)] pv <- sapply(1:M, function(i) { t.test(x[i,ingroup], x[i,outgroup])$p.value
})
names(pv) <- rownames(x)
pv.sig <- names(pv)[pv < 0.05/M/G] ## bonferonni
pv.sig
})


I've already tested many different combinations, but I am too weak in writing functions to solve what is wrong. So, I ask you as pros for help and I'll appreciate it much.

Best regards!

deg ttest function • 854 views
0
Entering edit mode

Is this RNA-seq? If so, I'd highly recommend using an established method based on negative binomial distributions like DESeq2 or edgeR rather than trying to use a homebrew method. They will be more accurate and sensitive, plus they handle outliers in sensible ways and are flexible enough to do any sort of comparison you'd want.

They have very good documentation and are generally easy to use as well.

0
Entering edit mode

Indeed, this is RNA-seq. Thank you for your suggestion, I think I'll do that as the method you mentioned are broadly recognized.

Nevertheless, as my abilities to write functions are very poor and I am trying to learn something, I wish I could solve the presented issue anyway. Just for my knowledge.

0
Entering edit mode

I don't think your ingroup (or outgroup) variables will end up with anything in them as they currently are. Generally, it is often a good idea to run your logic outside of a function first (or abuse print statements) to be sure you're getting what you expect at each step.