Generate All Possible Permutations Of Sample Labels In R
3
2
Entering edit mode
10.7 years ago
Paul ▴ 760

Hi,

I'm trying do some permutation testing on microarray data in R and the sample labels look like this:

labels <- c("case_1", "case_2", "case_3", "case_4", "case_5",
"control_1", "control_2", "control_3", "control_4", "control_5")


To permute them I realize I could just repeatedly use "sample(labels)" to do random permutations, but I'd like to look ALL possible permutations.

I think there should be ((10 choose 5) / 2) = 126 possibilities.

Does anyone know if there's an easy way to generate these though?

Thanks!

r microarray • 25k views
1
Entering edit mode

Either you want combination or permutation, that does not produces 126 possibilities. Can you be more specific, how did you calculate 126 possibilities?

1
Entering edit mode

Ahh. I think I get it now. They want all possible 5 sample vs 5 sample comparisons. For each of the 252 possible combinations of 10 choose 5 that I calculated below you would compare those 5 samples to the remaining 5 samples (which are another combination). This pairing of 5 sample combinations will create duplicates (ABCDE vs FGHIJ is the same as FGHIJ vs ABCDE). So, the number of 126 possible pairs of 5 combinations is correct I think. As long as no replacement allowed anywhere.

1
Entering edit mode

Yep. 'combn' works too (I suggested combinations from 'gregmisc' package). See my answer below if you also need help removing the duplicate cases. Set "combos=t(combn(labels,5))" and then the rest of the code will work as indicated.

0
Entering edit mode

Yep Obi, you got that right, see "Chris Millers" message for how to produce it. Startlingly straightforward once you know how!

4
Entering edit mode
10.7 years ago

Using the standard R libaries, you can do it with the combn() function

> a = c(1,2,3,4)
> combn(a,3)
[,1] [,2] [,3] [,4]
[1,]    1    1    1    2
[2,]    2    2    3    3
[3,]    3    4    4    4


Your combinations are in the columns.

0
Entering edit mode

Yes! This produces exactly what I was looking for:

a <- c(1,2,3,4,5,6,7,8,9,10)
combn(a, 5)


Produces a matrix with 252 columns. Now All I need to do is remove the duplicates and I've got the 126 I was looking for!

0
Entering edit mode

combn(labels, 5) from the labels vector in my original post is exactly what I'm looking for.

4
Entering edit mode
10.7 years ago

You probably don't want permutations, since order doesn't matter, correct? I'm guessing you want all possible combinations of 5 from your vector of 10 (without replacement). Check out the functions 'permutations' and 'combinations' from package 'gregmisc'. If you want all possible comparisons of 5 samples versus 5 samples you can determine the remaining samples for each random combination of 5 and then remove duplicates with a combination of 'sort' and 'duplicated'. I think the code you want is below. That produces 252 possible combinations of 5 samples from your list of 10 and the 126 possible comparisons of 5 samples versus 5 samples.

library(gregmisc)
labels=c("case_1","case_2","case_3","case_4","case_5","control_1","control_2","control_3","control_4","control_5")

#Get all possible combinations of 5 samples (without replacement)
combos=combinations(n=10, r=5, v=labels)

#Get all possible comparisons of 5 samples vs 5 samples
getpair=function(x){return(labels[which(!labels %in% x)])}
combo_pairs=t(apply(combos,1,getpair))
comparisons=cbind(apply(combos,1,paste, collapse=" "),apply(combo_pairs,1,paste, collapse=" "))
comparisons.sort=t(apply(comparisons, 1, sort))
comparisons.uniq=comparisons[!duplicated(comparisons.sort),]


A small aside, this question might have been better posed to the statistics stackexchange where it was answered or stackoverflow [R tagged] where it has been addressed many times. It is not directly a bioinformatics problem, although I grant that it comes up in bioinformatics a lot, and my preferred solution is buried pretty deep in those sources.

0
Entering edit mode

Obi, I am interested in applying your code above to a similar problem but instead of two groups (control and case) I have three groups (control, caseA, caseB). I would like to permute the labels of the samples across the three phenotypes and then compare control vs caseA, control vs caseB, caseA vs caseB using limma-voom for differential expression. I have 4 patients in each group, so:

library(gtools)
labels=c("case_1A","case_2A","case_3A","case_4A","case_1B","case_2B","case_3B","case_4B","control_1","control_2","control_3","control_4")
combos=combinations(n=12, r=4, v=labels)


But then how can I obtain all possible comparisons: control vs caseA (4 vs 4) control vs caseB (4 vs 4) caseA vs caseB (4 vs 4) Should I use a similar function as the one you have above?

3
Entering edit mode
10.7 years ago
Neilfws 49k

It's not clear from your question precisely which combinations of labels you want to generate. If you have 10 sample labels (columns) and "all possible combinations" means "any label in any column once", you're looking at 10! = 3628800 combinations. If that's not what you mean, you need to define the problem more clearly.

Also, in microarray permutation testing, people have an unfortunate habit of using the word "permutation" to mean "number of times the labels were randomised" (e.g. "1000 permutations").

I don't think you need to reinvent the wheel for this task. Search the Bioconductor website for permutation; there are several useful packages there, such as multtest.

0
Entering edit mode

+1 to this comment. Most likely one of the approaches described in multtest will do what you want without reinventing the wheel.

0
Entering edit mode

Fair point, for reasons very specific to this microarray platform and the analysis we're doing I need to run these permutations through limma though! Thanks for the suggestion though!