I've noticed many people use Venn diagrams in RNA seq analyses for identifying genes which are commonly differentially expressed in two or more datasets. An example of this would be the following:
Two datasets where in each a comparison is made between a control and some treatment X. The first dataset was generated from say liver tissue and the second from brain tissue. Differentially expressed genes are generated for each comparison, say we are interested in only upregulated genes. So we have:
- Liver: Control vs Treatment X --> A set A of upregulated genes
- Brain: Control vs Treatment X --> A set B of upregulated genes
Then authors make claims about a set C of genes commonly differentially expressed (in this case, upregulated) in the two (and thus also about genes not commonly upregulated in the two) based on:
C = intersection(A, B)
I'm a bit skeptical about this. Given that the two datasets were analysed independently and each is assigned with an uncertainty value (e.g. an FDR rate of 10%), wouldn't claims about their fusion essentially carry an uncertainty greater than the individual ones?
If so, how would one go about generating the set C properly?