I am trying to replicate the analysis conducted in Brennand et al (http://www.ncbi.nlm.nih.gov/pubmed/24686136) to cluster microarray expression data (or, in my case, RNA-seq data) with data available in the Allen Brain Atlas. I'm afraid their methods description is not clear enough for me to understand what exactly they did and hence how to repeat it.
To quote the paper (which is open-source, btw):
Cross-platform comparisons of our hiPSC microarray and the AllenBrain Atlas microarrays were done by (1) ranking absolute gene expression for each microarray using Partek, (2) assigning a rank difference value for each gene using a MATLAB script, and (3) calculating Spearman Rank Correlation Coefficients for each microarray comparison in Microsoft Excel. Wilcoxon’s rank-sum test was assessed if a category of interest (spatial, temporal or combined) had significantly higher Spearman correlations than the background of all pairwise correlations. No hard cutoffs of ‘best matches’ were used.
I have tried the following approach (using R):
- Merge data frame with microarray data from ABA with my RNA-seq normalized count table. This gets rid of genes that are not interrogated with the microarray.
- Rank data by absolute gene expression using the rank function (min for ties)
- Use the cor(rankedJune, method="spearman") to calculate spearman rank correlation coefficients
- Identify the Wilcoxon rank-sum test for each of the comparisons -log(pairwise.wilcox.test(correlationSpear2RelevantMelt$value, correlationSpear2RelevantMelt$variable, p.adjust = "bon")$p.value)
The outcome looks like complete bollocks. What am I doing wrong with my approach, and how do I replicate exactly what the authors have done on my data? (I've also tried doing a conventional spearman between my data and the ABA array, and the datasets (predictably) show that they are quite distinct.
Thanks in advance!