Question: Code for multiclass differential gene expression analysis
0
gravatar for orzech_mag
8 months ago by
orzech_mag220
Poland/Łódź
orzech_mag220 wrote:

Dear Collegues,

I am trying to perform analysis of differentially expressed genes between multiple groups. For that I found some tutorial (https://jef.works/blog/2017/05/31/multiclass-diffential-expression-analysis/). I am interested in t-test only between one group vs rest. This is corresponding code part:

pv.sig <- lapply(levels(groups), function(g){
  ingroup <- names(groups)[groups %in% g]
  outgroup <- names(groups)[!(groups %in% g)]
  pv <- sapply(1:M, function(i) {
    t.test(x[i,ingroup], x[i,outgroup])$p.value
  })
  names(pv) <- rownames(x)
  pv.sig <- names(pv)[pv < 0.05/M/G] ## bonferonni
  pv.sig
})

When I run it on data created in the tutorial everything is fine. When I run it on my data each time I get such error:

Error in 'x[i, ingroup]': subscript out of bounds

This is how my data looks like: my data

My example file may be downloaded here: example data

This is how my code looks like:

x = read.table("~/Desktop/DAT.txt", sep = "\t", dec = ".", header = T)
groups = c(rep("B", 505), rep("C", 304), rep("O", 301), rep("U", 370))
groups = as.factor(groups)

rownames(x) = x$X
x$X = NULL
x = as.matrix(x)
names(groups) <- colnames(x)

M = 1480
G = 4
pv.sig <- lapply(levels(groups), function(g){
  ingroup <- names(groups)[groups %in% g]
  outgroup <- names(groups)[!(groups %in% g)]
  pv <- sapply(1:M, function(i) {
    t.test(x[i,ingroup], x[i,outgroup])$p.value
  })
  names(pv) <- rownames(x)
  pv.sig <- names(pv)[pv < 0.05/M/G] ## bonferonni
  pv.sig
})

I've already tested many different combinations, but I am too weak in writing functions to solve what is wrong. So, I ask you as pros for help and I'll appreciate it much.

Best regards!

deg ttest function • 229 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by orzech_mag220

Is this RNA-seq? If so, I'd highly recommend using an established method based on negative binomial distributions like DESeq2 or edgeR rather than trying to use a homebrew method. They will be more accurate and sensitive, plus they handle outliers in sensible ways and are flexible enough to do any sort of comparison you'd want.

They have very good documentation and are generally easy to use as well.

ADD REPLYlink written 8 months ago by jared.andrews077.5k

Indeed, this is RNA-seq. Thank you for your suggestion, I think I'll do that as the method you mentioned are broadly recognized.

Nevertheless, as my abilities to write functions are very poor and I am trying to learn something, I wish I could solve the presented issue anyway. Just for my knowledge.

ADD REPLYlink written 8 months ago by orzech_mag220

I don't think your ingroup (or outgroup) variables will end up with anything in them as they currently are. Generally, it is often a good idea to run your logic outside of a function first (or abuse print statements) to be sure you're getting what you expect at each step.

ADD REPLYlink written 8 months ago by jared.andrews077.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1017 users visited in the last hour