Hi all, I’ve been working on RNA-seq data for a while, in the context of (yet another) differential expression app, and wanted to try to new ways of representing data. Upset plots appeared interesting, especially given the prevalence of Venn Diagrams to compare sets of genes in the case of multiple contrasts such as treatment A vs control and treatment B versus control. They can visually answer questions of how many genes are differentially expressed across those two contrasts? How many are exclusive to this one?
I initially wanted to represent differentially expressed genes (DEGs) in common between contrasts, then it hit me that there are two types of DEGs in common between treatment A and B : the ones that are differentially expressed in the same direction, and the others. If Gene 34 is overexpressed in treatment A vs control, and underexpressed treatment B versus control, is there a biological point to place it in the same group as Gene 56 that is overexpressed in both? Outside of the plot itself, I wanted the different intersections to be downloadable, to be used in functional enrichment tools down the line.
To me, this is somewhat adjacent (but not identical) to the question of signed and unsigned networks in the case of WGCNA, as to whether it makes sense to distinguish between negative correlation or positive correlation between two genes (signed), or not (unsigned), when translating correlation to ridge weight. In this case, I think the authors favored signed networks, which in our case would be akin to distinguishing between over and underexpressed?
The issue is that, if one favors DEGs that are expressed in the same direction across two contrasts, then most upset plots packages (I use R) I’ve seen, that rely on a boolean matrix data that looks like this :
| | treatmentA_vs_Control | TreatmentB_vs_Control | TreatmentC_vs_Control | |-------|-----------------------|-----------------------|-----------------------| | Gene1 | TRUE | FALSE | TRUE | | Gene2 | FALSE | FALSE | TRUE | | Gene3 | FALSE | TRUE | TRUE |
where TRUE indicates that a gene is differentially expressed in a contrast, cannot be used. One cannot specify a priori if a DEG is indeed DE in the same direction in two contrasts, before comparing two sets.