Question

re sampling for unbalanced samples

0

Entering edit mode

4.6 years ago

Lila M ★ 1.2k

Dear all, I have a data set for two condition (before and after treatment). The patients are in rows and the gen expression in levels. I want to compare if there is any different in treatment, but the main problem that I have is the unbalanced samples (control 30:case 60). I would like to re-sample the data set with 60 IDs to 30 compare it with control (get p-values for each gene). And repeat it around 50 times. I would like to do that in R, but I don't know any specific function to do that. The library(boot) seems to be nice, but I couldn't figure out how to apply it to my data set. Any advice? Thanks

resamplig unbalanced samples • 1.2k views

ADD COMMENT • link 4.6 years ago by Lila M ★ 1.2k

0

Entering edit mode

why not do you use this directly into Limma or similar packages?

ADD REPLY • link 4.6 years ago by JC 13k

0

Entering edit mode

Unbalanced samples are not a problem per se as long as the numbers are sufficient for the dispersion estimation. As JC says, feed the data into standard tools such as limma and obtain DEG results. Typically there is no need for custom approaches.

ADD REPLY • link 4.6 years ago by ATpoint 82k

0

Entering edit mode

Hi, I know how to do that in limma. The point is, that I want to create a function to apply it to a gene table, OTU table... in other words to extrapolate to different approach. Any clue? Thanks!

ADD REPLY • link 4.6 years ago by Lila M ★ 1.2k

0

Entering edit mode

Please use the comment function, not the answer box.

It is unclear what you mean. Is the problem how to randomly pull samples? Please give a representative example.

enter image description here

ADD REPLY • link 4.6 years ago by ATpoint 82k

0

Entering edit mode

Sorry, the "add comment" button doesn't work for me, it gives me an error all time, so this is why I used the "add comment".

Lets say I have a data frame like this

ID group gene1 gene2 gene3...
s1 health 0.1 0.07 0.2
s2 cancer0.5  0.05 0.4

and I have 20 healthy samples and 40 cancer samples. I want to compare each gene using something like this

wilcoxon.t <- lapply(3:227, function(x) pairwise.wilcox.test(sample[[x]], sample$group))
names(wilcoxon.t ) <- names(sample)[3:227]

add.p.val <- sapply(wilcoxon.t, function(x) {
  p <- x$p.value
  n <- outer(rownames(p), colnames(p), paste, sep='vs')
  p <- as.vector(p)
  names(p) <- n
  p
})

but what I really want is to re-sample the "cancer" samples (get 20 samples each time) and repeat the wilcoxon test 50 times using this resample

ADD REPLY • link 4.6 years ago by Lila M ★ 1.2k

0

Entering edit mode

Is it RNA-Seq or microarray ? As already mentioned you should use a dedicated tool and not try to reinvent the wheel. For RNA-Seq use raw counts and DESeq2 or edgeR ; for microarray use limma.

ADD REPLY • link 4.6 years ago by Nicolas Rosewick 10k

0

Entering edit mode

Is the third time that I got this answer. I explained why I want to do that. Please, add I comment if you could give any new feedback

ADD REPLY • link 4.6 years ago by Lila M ★ 1.2k