Question: Code for multiclass differential gene expression analysis
0
gravatar for orzech_mag
6 weeks ago by
orzech_mag210
Poland/Łódź
orzech_mag210 wrote:

Dear Collegues,

I am trying to perform analysis of differentially expressed genes between multiple groups. For that I found some tutorial (https://jef.works/blog/2017/05/31/multiclass-diffential-expression-analysis/). I am interested in t-test only between one group vs rest. This is corresponding code part:

pv.sig <- lapply(levels(groups), function(g){
  ingroup <- names(groups)[groups %in% g]
  outgroup <- names(groups)[!(groups %in% g)]
  pv <- sapply(1:M, function(i) {
    t.test(x[i,ingroup], x[i,outgroup])$p.value
  })
  names(pv) <- rownames(x)
  pv.sig <- names(pv)[pv < 0.05/M/G] ## bonferonni
  pv.sig
})

When I run it on data created in the tutorial everything is fine. When I run it on my data each time I get such error:

Error in 'x[i, ingroup]': subscript out of bounds

This is how my data looks like: my data

My example file may be downloaded here: example data

This is how my code looks like:

x = read.table("~/Desktop/DAT.txt", sep = "\t", dec = ".", header = T)
groups = c(rep("B", 505), rep("C", 304), rep("O", 301), rep("U", 370))
groups = as.factor(groups)

rownames(x) = x$X
x$X = NULL
x = as.matrix(x)
names(groups) <- colnames(x)

M = 1480
G = 4
pv.sig <- lapply(levels(groups), function(g){
  ingroup <- names(groups)[groups %in% g]
  outgroup <- names(groups)[!(groups %in% g)]
  pv <- sapply(1:M, function(i) {
    t.test(x[i,ingroup], x[i,outgroup])$p.value
  })
  names(pv) <- rownames(x)
  pv.sig <- names(pv)[pv < 0.05/M/G] ## bonferonni
  pv.sig
})

I've already tested many different combinations, but I am too weak in writing functions to solve what is wrong. So, I ask you as pros for help and I'll appreciate it much.

Best regards!

deg ttest function • 91 views
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by orzech_mag210

Is this RNA-seq? If so, I'd highly recommend using an established method based on negative binomial distributions like DESeq2 or edgeR rather than trying to use a homebrew method. They will be more accurate and sensitive, plus they handle outliers in sensible ways and are flexible enough to do any sort of comparison you'd want.

They have very good documentation and are generally easy to use as well.

ADD REPLYlink written 6 weeks ago by jared.andrews075.3k

Indeed, this is RNA-seq. Thank you for your suggestion, I think I'll do that as the method you mentioned are broadly recognized.

Nevertheless, as my abilities to write functions are very poor and I am trying to learn something, I wish I could solve the presented issue anyway. Just for my knowledge.

ADD REPLYlink written 6 weeks ago by orzech_mag210

I don't think your ingroup (or outgroup) variables will end up with anything in them as they currently are. Generally, it is often a good idea to run your logic outside of a function first (or abuse print statements) to be sure you're getting what you expect at each step.

ADD REPLYlink written 6 weeks ago by jared.andrews075.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1255 users visited in the last hour