Question: Generate All Possible Permutations Of Sample Labels In R
2
7.0 years ago by
Paul750
United States
Paul750 wrote:

Hi,

I'm trying do some permutation testing on microarray data in R and the sample labels look like this:

``````labels <- c("case_1", "case_2", "case_3", "case_4", "case_5",
"control_1", "control_2", "control_3", "control_4", "control_5")
``````

To permute them I realize I could just repeatedly use "sample(labels)" to do random permutations, but I'd like to look ALL possible permutations.

I think there should be ((10 choose 5) / 2) = 126 possibilities.

Does anyone know if there's an easy way to generate these though?

Thanks!

R microarray • 18k views
modified 7.0 years ago by Obi Griffith17k • written 7.0 years ago by Paul750
1

Either you want combination or permutation, that does not produces 126 possibilities. Can you be more specific, how did you calculate 126 possibilities?

1

Ahh. I think I get it now. They want all possible 5 sample vs 5 sample comparisons. For each of the 252 possible combinations of 10 choose 5 that I calculated below you would compare those 5 samples to the remaining 5 samples (which are another combination). This pairing of 5 sample combinations will create duplicates (ABCDE vs FGHIJ is the same as FGHIJ vs ABCDE). So, the number of 126 possible pairs of 5 combinations is correct I think. As long as no replacement allowed anywhere.

1

Yep. 'combn' works too (I suggested combinations from 'gregmisc' package). See my answer below if you also need help removing the duplicate cases. Set "combos=t(combn(labels,5))" and then the rest of the code will work as indicated.

Yep Obi, you got that right, see "Chris Millers" message for how to produce it. Startlingly straightforward once you know how!

4
7.0 years ago by
Chris Miller20k
Washington University in St. Louis, MO
Chris Miller20k wrote:

Using the standard R libaries, you can do it with the combn() function

``````> a = c(1,2,3,4)
> combn(a,3)
[,1] [,2] [,3] [,4]
[1,]    1    1    1    2
[2,]    2    2    3    3
[3,]    3    4    4    4
``````

Your combinations are in the columns.

Yes! This produces exactly what I was looking for:

a <- c(1,2,3,4,5,6,7,8,9,10) combn(a, 5)

Produces a matrix with 252 columns. Now All I need to do is remove the duplicates and I've got the 126 I was looking for!

combn(labels, 5) from the labels vector in my original post is exactly what I'm looking for.

4
7.0 years ago by
Obi Griffith17k
Washington University, St Louis, USA
Obi Griffith17k wrote:

You probably don't want permutations, since order doesn't matter, correct? I'm guessing you want all possible combinations of 5 from your vector of 10 (without replacement). Check out the functions 'permutations' and 'combinations' from package 'gregmisc'. If you want all possible comparisons of 5 samples versus 5 samples you can determine the remaining samples for each random combination of 5 and then remove duplicates with a combination of 'sort' and 'duplicated'. I think the code you want is below. That produces 252 possible combinations of 5 samples from your list of 10 and the 126 possible comparisons of 5 samples versus 5 samples.

``````library(gregmisc)
labels=c("case_1","case_2","case_3","case_4","case_5","control_1","control_2","control_3","control_4","control_5")

#Get all possible combinations of 5 samples (without replacement)
combos=combinations(n=10, r=5, v=labels)

#Get all possible comparisons of 5 samples vs 5 samples
getpair=function(x){return(labels[which(!labels %in% x)])}
combo_pairs=t(apply(combos,1,getpair))
comparisons=cbind(apply(combos,1,paste, collapse=" "),apply(combo_pairs,1,paste, collapse=" "))
comparisons.sort=t(apply(comparisons, 1, sort))
comparisons.uniq=comparisons[!duplicated(comparisons.sort),]
``````

A small aside, this question might have been better posed to the statistics stackexchange where it was answered or stackoverflow [R tagged] where it has been addressed many times. It is not directly a bioinformatics problem, although I grant that it comes up in bioinformatics a lot, and my preferred solution is buried pretty deep in those sources.

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Obi Griffith17k

Obi, I am interested in applying your code above to a similar problem but instead of two groups (control and case) I have three groups (control, caseA, caseB). I would like to permute the labels of the samples across the three phenotypes and then compare control vs caseA, control vs caseB, caseA vs caseB using limma-voom for differential expression. I have 4 patients in each group, so:

``````library(gtools)
labels=c("case_1A","case_2A","case_3A","case_4A","case_1B","case_2B","case_3B","case_4B","control_1","control_2","control_3","control_4")
combos=combinations(n=12, r=4, v=labels)
``````

But then how can I obtain all possible comparisons: control vs caseA (4 vs 4) control vs caseB (4 vs 4) caseA vs caseB (4 vs 4) Should I use a similar function as the one you have above?

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by a.e.c30
3
7.0 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

It's not clear from your question precisely which combinations of labels you want to generate. If you have 10 sample labels (columns) and "all possible combinations" means "any label in any column once", you're looking at 10! = 3628800 combinations. If that's not what you mean, you need to define the problem more clearly.

Also, in microarray permutation testing, people have an unfortunate habit of using the word "permutation" to mean "number of times the labels were randomised" (e.g. "1000 permutations").

I don't think you need to reinvent the wheel for this task. Search the Bioconductor website for permutation; there are several useful packages there, such as multtest.

This page on statistical testing is also a useful read, particularly the section on permutation tests.

+1 to this comment. Most likely one of the approaches described in multtest will do what you want without reinventing the wheel.