I have RNA-seq data of purified cell type A (one sample, no replicate) and microarray data of a mixture (cell type A and B). I would like to perform gene expression deconvolution to estimate the proportion of each cell type and the gene expression of cell type B in the mixture. However, the datasets are based on different platforms. I need convert cell type A's RNA-seq data to microarray data before doing expression deconvolution. How to do this conversion?
What I mean is: As described in the limma package: page69, "In the limma approach to RNA-seq, read counts are converted to log2-counts-per-million (logCPM) and the mean-variance relationship is modelled either with precision weights or with an empirical Bayes prior trend. In either case, the RNA-seq data can be analyzed as if it was microarray data. "
What I concern: 1.I am not sure if I can use these two methods for my task, as they are designed for differentially expressed gene analysis. 2. I feel that "the counts are converted to logCPM values using edgeR’s cpm function" --simple log2 cpm trasformation seems not enough for my task. After doing this transformation, will I be able to treat the transformed data as they were from microarray? 3.If I use voom, I only have 1 sample, then the design matrix is 1?
Do you mean that you want to encode the RNA-seq data as an ExpressionSet object in R?; and / or is it that you just want your RNA-seq data transformed to log (base 2)? Which deconvolution package are you using?
I edit my question. Hope to make it clearer. I want to treat my RNA-seq data as it was microarray data.
I will only ask the following in the interest of clarification.
Does that mean a matrix of counts/values where the rows represent genes and corresponding cells in column expression estimates (numbers)?
I edit my question. Hope to make it clearer. I want to treat my RNA-seq data as it was microarray data.
It's tough to deconvolve RNA-seq data, it doesn't sum sources linearly like microarray data does (more or less) so typical deconvolution methods tend to perform relatively poorly (they'll tend to state otherwise in their papers, but the rarely show useful non-microarray comparisons).
My suggestion would be to take the voom-transformed values and put them and the microarray values through combat (in the SVA package in R) and hope you get something still useful. There's going to be a messy batch effect between the two datasets to begin with that's further complicated by the difference in cell type...but that's pretty much your only path forward with this dataset.