Question

How to compare two sets of results from Limma?

0

Entering edit mode

4.3 years ago

Colari19 ▴ 90

Hi,

Say I've carried out two differential expression analyses using the limma-voom pipeline on two datasets, A and B.

How could one go about comparing the results from these two analyses to see how similar or different they are?

For example, there may be genes that are differentially expressed in set A but not in set B, and vice versa.

Also, there may be genes that are differentially expressed in both set A and set B, but are going in opposite directions.

There also may be genes that are behaving similarly in set A and in set B.

This is quite a broad question but I'm wondering if anyone has ideas about how to investigate this. In my particular case I'd like to see how well the results from set A replicate in set B. Is the best way just to count the numbers of genes that fall into the above categories?

Thank you

RNA-Seq limma R voom • 2.3k views

ADD COMMENT • link updated 4.3 years ago by Kristoffer Vitting-Seerup ★ 4.0k • written 4.3 years ago by Colari19 ▴ 90

1

Entering edit mode

Comparing the actual gene sets is statistically VERY problematic, have you considered just performing the analysis as if it's a single (multifactorial) dataset? There may be a batch effect, but that's statistically more robust.

ADD REPLY • link 4.3 years ago by Devon Ryan 104k

1

Entering edit mode

I guess you would need to show that the overall expression changes are similar between these datasets. How about using something like Gene Set Enrichment Analysis to try and show that the differentially-expressed genes lead to enrichment for certain gene sets / pathways. Comparing datasets 1:1 is always problematic because things like different library preparation kits can lead to different results regardless if the underlying biology. Therefore I always find it helpful if the biological message is the same in both datasets. Alternatively you could cluster both datasets and then assign KEGG (or similar) pathways to different clusters, and then see if this is biologically-comparable.

ADD REPLY • link 4.3 years ago by ATpoint 81k

1

Entering edit mode

4.3 years ago

seidel 11k

The simple approach: Select DE genes from each data set and examine their overlap via Venn diagram. Do genes DE overall overlap? Do genes in each category (up, down) overlap? You can evaluate the significance of overlap using a hypergeometric distribution or fisher test (see the help for phyper() if you know R, your A and B experiments will be like setting up an Urn). You can ask specific questions about gene sets in each experiment using the geneSetTest() function in the limma library.

This is the quickest, least complicated approach I know of. If a top set of genes from one experiment is no more enriched for genes from a top set of a second experiment than a random selection of genes, then in general, the experiments are not returning similar results. Whereas if you expect results from A to be replicated in B, then you should see some overlap.

ADD COMMENT • link 4.3 years ago by seidel 11k

score 2 · Accepted Answer · 2020-01-09

I would start by comparing the log2FC of all genes analyzed between the two studies - it might be that some genes are not DE in one study but the log2FC are similar.. One way to do this would be to plot the log2FC vs each other colouring by the DE status of in the two models. You can also do this only for genes DE in at least one study

If you want the advanced version you can use limma's genas() function to do a more advanced version where you can distinguish between technical and biological correlation.