Hi,
I am working with some public transcriptomics datasets (both Microarray and RNA-Seq) to study gene signatures of bacterial “Infected” samples vs. Healthy Control samples. The starting point of analysis of Microarray data is .CEL files for Affymetrix (normalize using GCRMA) and .txt file for Illumina/Agilent arrays (Normal Exponential Background Correction and Percentile Normalization). However for RNA-Seq, raw gene counts data is primary starting point and then process using scale factor normalization using DESeq2. After normalization of the each individual dataset, I would like to integrate both Microarray and RNA-Seq datasets into one matrix and make it comparable via heatmap visualization. I came across COCONUT
, ComBat
, sva
package which allows pooled analysis. I would like to get your inputs and feedbacks if these packages could be used for batch correction and standardize the data to make it comparable or are there any other packages that would perform this intended function or paper references would be very helpful.
Thank you,
Best Regards,
Toufiq
All datasets have infected vs. control. However, there are couple of dataset with others, so at the moment I am not considering them.