Hi I have raw read counts of two targeted rnaseq platforms, one targets 2256 probes and the other 1450 probes. They have 700 common genes in common. The chemistry of platforms is the same. The correlation of samples for 700 common genes is 80 percent. The same patients have been used for both platforms. I want to merge two platforms to have 3700 genes together. Could I simply take mean of raw counts of 700 common genes? Any suggestion please Thank you
There are several warning flags that come to mind based on the description of the data analysis especially when it comes to developing a predictive model for responder vs non-responder patients. There are many factors such as knowing the specifics on numbers, distribution of data, the similarity of probes, and even the disease model as genetics can have a factor. Correlation is an indicator of gene expression consistency but not a 'good' one since it could be misleading such as outliers.
1) Are the probes the same between platforms? - Seems the targeted platforms are not the same since only 700 are common 2) Pretty much guaranteed there is going to be a batch effect between the two platforms, so testing it is wise during the evaluation 3) Are there useful controls to look at variation within the datasets? 4) While the chemistry of platforms may be the same, sample prep and hands play a factor
These are just a few thoughts to help guide the analysis.