Question

Aggreggating RNA-Seq log2FC & test statistics for GSEA ranking

0

Entering edit mode

3.8 years ago

monovich • 0

A question was posed to me by a colleague today, and I'm not entirely sure I have a good answer. For my own knowledge and for the benefit of my colleague, I'd like to know what you'd do to address their specific proposal. I have a few of my own ideas about how one could do this, but I'm not super confident in their technical "correctness".

Their idea:

You have 3 tables of DESeq2 results. Each table represents a contrast between the same treatment and control (no treatment), but in a different cell line. You'd like to aggregate these changes to identify changes that are consistent across each cell line. You'd also like to use this aggregate data for use in a downstream geneset enrichment analysis to identify biological processes associated with the treatment, so you'll need some way of ranking the genes. In this scenario, how would you aggregate the gene expression changes/pvalues across the three contrasts, and what calculated statistic should you use to rank the consensus genes?

Thanks!

RNA-Seq meta-analysis GSEA • 1.3k views

ADD COMMENT • link updated 3.8 years ago by GouthamAtla 12k • written 3.8 years ago by monovich • 0

0

Entering edit mode

First I would take the fold change of genes for each cell-line w.r.t controls and create a heatmap (A fold change matrix where cell lines are columns and genes are rows) to get a sense of magnitude of "response" across cell lines and their sharing. The heatmap reveals groups of genes that show consistent up/down regulation across cell lines and other genes that may change only in one condition. You could pre-select genes i.e take genes that show same direction of effect in atleast 2 cell-lines, just to reduce noise and to see clear patterns.

If you want a statistical assessment, probably you could borrow meta-analysis concept from GWAS/eQTL studies that assess of the effect size (here fold change) has same direction of effect and is significant across cohorts (here cell lines). This doesn't directly assess replication but will give you signals that are consistent across three replicates. This will be some sort of 'aggregation' of effect sizes and p-values to look for consistency. There also exists methods to check for replication, but I am not sure if they give any gene-wise metric ( if you want to do GSEA) . As far as I know they give one metric to see extent of replication.

Other option is to check if differential expression methods models for what you are looking for. i.e model in such a way that the differential expression is performed against all controls vs all treatments as one group and gives you significant genes only if they are consistently up/down regulated across cell lines.

ADD REPLY • link 3.8 years ago by GouthamAtla 12k

0

Entering edit mode

@geek_y Thanks for the suggestions! I think we're pretty much on the same wavelength.

A clustered heatmap of log2FoldChanges is a great visual recommendation for this sort of comparison and something that I recommended to my colleague as they put together figures for their manuscript.

Regarding statistical aggregation/assessment, my suggestion to my colleague was to just remodel the entire comparison in DESeq2 as ~line + treatment pooling all the different cell lines as you have suggested. I'm not sure if he had access to the relevant count matrices to perform this contrast/model.

I do know he was hoping there would be a "equation" of sorts that one could plug the log2FoldChange, test statistic, standard error, significance, etc. into and more or less weight an "average" log2FoldChange using the significance of the individual log2FoldChange. This is where I'm not really sure I have any recommendation. Usually I would suggest just introducing a significance cutoff and as you suggest, only considering the genes that have the same sign log2FoldChange. However, this isn't really compatible with a GSEA in which you provide as many of the most highly expressed genes as possible. I think your idea about borrowing concepts from meta-analysis type analyses is correct, but I'm not totally sure how it would be done.

ADD REPLY • link 3.8 years ago by monovich • 0