I’m currently working on a multispecies comparison of RNAseq transcriptome data and I would like to get your input for the design of my analysis steps.
I am comparing 4 species on 3 different hosts and 1 control. So in total I am comparing 16 groups with a control for each species. I am thinking to analyse my transcript expression data similarly as I do with other datapoints I measured: I have taken some measurements of the specimens to quantify performance, for correct comparisons of these measurements I corrected these values based on the values in the control group. Simply because each species has a different (species-specific) response, I correct using the values of the control group so you can directly compare the values of the different species. I am thinking to do the same for the gene expression data, and for this I would like to get your thoughts.
In the meantime I have read a lot of comparative papers (e.g. as covered in this discussion: https://www.biostars.org/p/358805/) and the very first step is determining what the ‘same’ genes are (e.g. by blast using a reference database or orthologue analysis). After this, many studies do a co-expression analysis of all gene expression data combined, or something similar but I do not really see the use of this for my data. These papers are e.g. comparing tissue type in different species without having a “neutral” control group.
I would like to get your views on this. To be a bit more focussed, my main issues:
Given my dataset, I want to correct the expression data of similar genes (based on ID/orthology) using the expression level in the control groups for each species as a correction for a direct comparison between species.
- Perhaps you are aware of a good recent paper dealing with multispecies comparison that take a different approach than ‘identification -> co-expression’?
Thanks a lot in advance!