I am trying to do a meta microarray study to determine potential receptor transcripts that could facilitate virus invasion.
The idea is to identify upregulated transcripts from a particular cell type as compared to nine others. From what I understand, this kind of study is fraught with computational and data-normalisation issues, but different approaches have been used to varying degrees of success by different people.
One way of conceivably approaching this would be to identify a set of genes that are expressed significantly more than a threshold value within each individual microarray data-set, draw up the resulting lists, and identify genes uniquely expressed by the target cell type. I can see issues with this right off the bat however ( this might exclude receptor genes right away, the assumption that low transcript levels implies not enough receptor molecules to facilitate invasions may be erroneous ).
I hope I'm not wrong in assuming that the analysis on the data sets cannot be done individually if am to generate meaningful results. The goal would then be to identify differentially expressed genes on one cell type when compared to nine others. From what i gather, normalization is the biggest issue here. What method would be best to approach this kind of meta-microarray study. Would RankProd work fine ( but I read that it compares only 2 experiments at a time ). I've come across another method that uses a measure called Cohen's D : http://www.pnas.org/content/103/16/6368.full ( any packages that could implement this? ) I'm basically trying to figure out what approach to compare the data works best. Also any ideas to implement this in R would be very helpful .
In my point of view "I hope I'm not wrong in assuming that the analysis on the data sets cannot be done individually if am to generate meaningful results" is not completely correct. This saying is true in the case when you will use Deferentially expressed gene list (results) from different studies for meta-analysis - different studies use different normalization and downstream analysis method that leads to inconsistency between these results [ that definitely will not be meaningful]. But you can use exactly same procedure for the analysis of individual studies and compare the results. For more basis information about meta-analysis see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2528050/ and http://www.sciencedirect.com/science/article/pii/S0378111907003289
However,as far as the choice of best normalization method concern that depends on the stage for data integration [ in my knowledge- please correct me if I am wrong]. In meta-analysis you can integrate your data either at early stage[ data source] or at the results level [ most commonly used the DEGs]. Now, suppose if you are integrated your data at the early stage[ in my knowledge i.e merging at the gene expression matrix] then there would be batch effects. Here you can use combat [see for more information How to correct for batch effect in microarray meta-analysis] . There are also other choices are available you can find them here http://www.hindawi.com/journals/isrn/2014/345106/. And if you are integrating results then I think most of the normalization methods would work fine. There is a very beautiful review article on meta-analysis published in 2012 where you can get the answer of most of your queries [http://www.ncbi.nlm.nih.gov/pubmed/22262733]
What approach is best is also depend on your objective and expertise. What I would suggest , first make a list of data integration at different stages, issues involve and methods available. Then select the method that suits your objective and expertise. Using R for meta-analysis is very good choice. As for microarray meta-analysis you will be needing basically the microarray analysis methods. And R and biocondctor best known for the microarray data analysis.
I hope this will help.