Are upset plots bad for differential expression analysis?
1
2
Entering edit mode
7 weeks ago
paulimer ▴ 20

Hi all, I’ve been working on RNA-seq data for a while, in the context of (yet another) differential expression app, and wanted to try to new ways of representing data. Upset plots appeared interesting, especially given the prevalence of Venn Diagrams to compare sets of genes in the case of multiple contrasts such as treatment A vs control and treatment B versus control. They can visually answer questions of how many genes are differentially expressed across those two contrasts? How many are exclusive to this one?

I initially wanted to represent differentially expressed genes (DEGs) in common between contrasts, then it hit me that there are two types of DEGs in common between treatment A and B : the ones that are differentially expressed in the same direction, and the others. If Gene 34 is overexpressed in treatment A vs control, and underexpressed treatment B versus control, is there a biological point to place it in the same group as Gene 56 that is overexpressed in both? Outside of the plot itself, I wanted the different intersections to be downloadable, to be used in functional enrichment tools down the line.

To me, this is somewhat adjacent (but not identical) to the question of signed and unsigned networks in the case of WGCNA, as to whether it makes sense to distinguish between negative correlation or positive correlation between two genes (signed), or not (unsigned), when translating correlation to ridge weight. In this case, I think the authors favored signed networks, which in our case would be akin to distinguishing between over and underexpressed?

The issue is that, if one favors DEGs that are expressed in the same direction across two contrasts, then most upset plots packages (I use R) I’ve seen, that rely on a boolean matrix data that looks like this :

|       | treatmentA_vs_Control | TreatmentB_vs_Control | TreatmentC_vs_Control |
|-------|-----------------------|-----------------------|-----------------------|
| Gene1 | TRUE                  | FALSE                 | TRUE                  |
| Gene2 | FALSE                 | FALSE                 | TRUE                  |
| Gene3 | FALSE                 | TRUE                  | TRUE                  |


where TRUE indicates that a gene is differentially expressed in a contrast, cannot be used. One cannot specify a priori if a DEG is indeed DE in the same direction in two contrasts, before comparing two sets.

4
Entering edit mode
7 weeks ago
LChart 840

It depends how you want to handle this. When I compare several quite different differential expression sets (multiple diseases, or multiple treatments), I will split the UpSet plots into "upregulated" and "downregulated" genes. If I'm comparing multiple sub-populations of the same study, or studies with substantially similar designs, I will use a standard UpSet plot, and enforce both significance and sign -- with the sign chosen by majority vote among significant genes (+ - + --> T F T; - - + -> T T F).

Finally there are cases where 'discordant' differential expression is of interest (such as comparing peak activity and gene expression); in such cases I will go as far as to split the differential expression gene sets by the nominal direction, so instead of set1, set2, set3 I have set1.up set1.down set2.up set2.down set3.up set3.down; color-code the sets by the direction. UpSet plots are still a key visual representation in this instance.

0
Entering edit mode

That's really interesting! Majority vote is indeed a solution, provided I find an elegant way to settle ties. The app will always be used for a single study, so I think it makes sense not to separate in up/downregulated.

Could you elaborate on what do you mean by discordant differential expression?

I'm still reluctant to separate by direction, as to me some up- and down-regulated genes could belong to the same biological pathway, I find the separation a bit arbitrary.

Thanks a lot!

Edit : The issue with your proposed solution is that it hinges a lot on judgement of the type of sets one faces. A judgement a bioinformatician can make, but not an app. For instance, the majority vote relies on sub population of the same studies being close, a judgement I can't make in advance of knowing said study (the context of an app).

1
Entering edit mode

Could you elaborate on what do you mean by discordant differential expression?

Significant, but opposite sign.

provided I find an elegant way to settle ties.

Whichever direction has the smallest product of p-values (i.e., which is more significant under a Fisher's method meta-analysis)?

but not an app

You could pick the approach you feel is most general; or allow the user to select (via some kind of pull-down) which visualization they want.

0
Entering edit mode

Whichever direction has the smallest product of p-values (i.e., which is more significant under a Fisher's method meta-analysis)?

That's indeed a good way to settle this!

You could pick the approach you feel is most general; or allow the user to select (via some kind of pull-down) which visualization they want.

Your first answer (separate up and down-regulated) is probably most general, and a pull-down might be a good idea indeed, even though it depends on quite a number of factor to be relevant, and might be a bit difficult for users to understand (and upset plots are already not that easy to grasp).