Question: Generate All Possible Permutations Of Sample Labels In R
2
gravatar for Paul
6.6 years ago by
Paul740
United States
Paul740 wrote:

Hi,

I'm trying do some permutation testing on microarray data in R and the sample labels look like this:

labels <- c("case_1", "case_2", "case_3", "case_4", "case_5",
            "control_1", "control_2", "control_3", "control_4", "control_5")

To permute them I realize I could just repeatedly use "sample(labels)" to do random permutations, but I'd like to look ALL possible permutations.

I think there should be ((10 choose 5) / 2) = 126 possibilities.

Does anyone know if there's an easy way to generate these though?

Thanks!

R microarray • 17k views
ADD COMMENTlink modified 6.6 years ago by Obi Griffith17k • written 6.6 years ago by Paul740
1

Either you want combination or permutation, that does not produces 126 possibilities. Can you be more specific, how did you calculate 126 possibilities?

ADD REPLYlink written 6.6 years ago by Vikas Bansal2.3k
1

Ahh. I think I get it now. They want all possible 5 sample vs 5 sample comparisons. For each of the 252 possible combinations of 10 choose 5 that I calculated below you would compare those 5 samples to the remaining 5 samples (which are another combination). This pairing of 5 sample combinations will create duplicates (ABCDE vs FGHIJ is the same as FGHIJ vs ABCDE). So, the number of 126 possible pairs of 5 combinations is correct I think. As long as no replacement allowed anywhere.

ADD REPLYlink written 6.6 years ago by Obi Griffith17k
1

Yep. 'combn' works too (I suggested combinations from 'gregmisc' package). See my answer below if you also need help removing the duplicate cases. Set "combos=t(combn(labels,5))" and then the rest of the code will work as indicated.

ADD REPLYlink written 6.6 years ago by Obi Griffith17k

Yep Obi, you got that right, see "Chris Millers" message for how to produce it. Startlingly straightforward once you know how!

ADD REPLYlink written 6.6 years ago by Paul740
4
gravatar for Obi Griffith
6.6 years ago by
Obi Griffith17k
Washington University, St Louis, USA
Obi Griffith17k wrote:

You probably don't want permutations, since order doesn't matter, correct? I'm guessing you want all possible combinations of 5 from your vector of 10 (without replacement). Check out the functions 'permutations' and 'combinations' from package 'gregmisc'. If you want all possible comparisons of 5 samples versus 5 samples you can determine the remaining samples for each random combination of 5 and then remove duplicates with a combination of 'sort' and 'duplicated'. I think the code you want is below. That produces 252 possible combinations of 5 samples from your list of 10 and the 126 possible comparisons of 5 samples versus 5 samples.

library(gregmisc)
labels=c("case_1","case_2","case_3","case_4","case_5","control_1","control_2","control_3","control_4","control_5")

#Get all possible combinations of 5 samples (without replacement)
combos=combinations(n=10, r=5, v=labels)

#Get all possible comparisons of 5 samples vs 5 samples
getpair=function(x){return(labels[which(!labels %in% x)])}
combo_pairs=t(apply(combos,1,getpair))
comparisons=cbind(apply(combos,1,paste, collapse=" "),apply(combo_pairs,1,paste, collapse=" "))
comparisons.sort=t(apply(comparisons, 1, sort))
comparisons.uniq=comparisons[!duplicated(comparisons.sort),]

A small aside, this question might have been better posed to the statistics stackexchange where it was answered or stackoverflow [R tagged] where it has been addressed many times. It is not directly a bioinformatics problem, although I grant that it comes up in bioinformatics a lot, and my preferred solution is buried pretty deep in those sources.

ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by Obi Griffith17k

Obi, I am interested in applying your code above to a similar problem but instead of two groups (control and case) I have three groups (control, caseA, caseB). I would like to permute the labels of the samples across the three phenotypes and then compare control vs caseA, control vs caseB, caseA vs caseB using limma-voom for differential expression. I have 4 patients in each group, so:

library(gtools)  
labels=c("case_1A","case_2A","case_3A","case_4A","case_1B","case_2B","case_3B","case_4B","control_1","control_2","control_3","control_4")
 combos=combinations(n=12, r=4, v=labels)

But then how can I obtain all possible comparisons: control vs caseA (4 vs 4) control vs caseB (4 vs 4) caseA vs caseB (4 vs 4) Should I use a similar function as the one you have above?

ADD REPLYlink modified 20 months ago • written 20 months ago by a.e.c30
4
gravatar for Chris Miller
6.6 years ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:

Using the standard R libaries, you can do it with the combn() function

> a = c(1,2,3,4)
> combn(a,3)
     [,1] [,2] [,3] [,4]
[1,]    1    1    1    2
[2,]    2    2    3    3
[3,]    3    4    4    4

Your combinations are in the columns.

ADD COMMENTlink written 6.6 years ago by Chris Miller20k

Yes! This produces exactly what I was looking for:

a <- c(1,2,3,4,5,6,7,8,9,10) combn(a, 5)

Produces a matrix with 252 columns. Now All I need to do is remove the duplicates and I've got the 126 I was looking for!

ADD REPLYlink written 6.6 years ago by Paul740

combn(labels, 5) from the labels vector in my original post is exactly what I'm looking for.

ADD REPLYlink written 6.6 years ago by Paul740
3
gravatar for Neilfws
6.6 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

It's not clear from your question precisely which combinations of labels you want to generate. If you have 10 sample labels (columns) and "all possible combinations" means "any label in any column once", you're looking at 10! = 3628800 combinations. If that's not what you mean, you need to define the problem more clearly.

Also, in microarray permutation testing, people have an unfortunate habit of using the word "permutation" to mean "number of times the labels were randomised" (e.g. "1000 permutations").

I don't think you need to reinvent the wheel for this task. Search the Bioconductor website for permutation; there are several useful packages there, such as multtest.

This page on statistical testing is also a useful read, particularly the section on permutation tests.

ADD COMMENTlink written 6.6 years ago by Neilfws48k

+1 to this comment. Most likely one of the approaches described in multtest will do what you want without reinventing the wheel.

ADD REPLYlink written 6.6 years ago by Obi Griffith17k

Fair point, for reasons very specific to this microarray platform and the analysis we're doing I need to run these permutations through limma though! Thanks for the suggestion though!

ADD REPLYlink written 6.6 years ago by Paul740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 774 users visited in the last hour