Question

GSEA for paired RNA-Seq data

0

Entering edit mode

4.6 years ago

bjeremy • 0

Hi all,

I have samples from several individuals from two different time points (e.g., individual 1 day 1, individual 1 day 30, individual 2 day 1, individual 2 day 30, etc.). Each individual has two samples, one at each time point. I have already looked at differential expression of the genes independently of one another, and now I would like to perform GSEA (http://software.broadinstitute.org/gsea/index.jsp) or a similar analysis on my RNA-seq data from these samples to identify groups of genes that are collectively differentially expressed between the two time points (essentially the same goal as the unanswered question found here: Pathway Analysis For Paired Microarray Data & Paired Rna-Seq Data? ). The GSEA faq says it cannot be used directly with paired samples, but that "if you create a ranked list of genes by running a paired-sample marker analysis outside of GSEA, you can use GSEA to analyze that ranked list of genes."

1) Is anyone aware of a documented, robust way to rank genes using paired samples as described above?

2) If not, is there a method similar to GSEA that takes into account a paired design?

Thank you!

RNA-Seq rna-seq gsea • 1.9k views

ADD COMMENT • link updated 4.6 years ago by Kristoffer Vitting-Seerup ★ 4.0k • written 4.6 years ago by bjeremy • 0

score 2 · Accepted Answer · 2019-10-01

2

Entering edit mode

4.6 years ago

Kristoffer Vitting-Seerup ★ 4.0k

The easiest way is probably to use edgeR for the DE (simply adding the patient id as a co-variate to the model matrix to take care of the paired effect) and then give that model directly to the gene-set enrichment analysis tool build into limma/edgeR - when then also takes the paired nature into account. From the edgeR/limma package I'd recommend using the CAMERA() for a competitive test or fry() for a self contained test.

ADD COMMENT • link 4.6 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

This looks like exactly what I’m looking for. Thank you! A couple of clarifying questions (I’m still new to edgeR and limma):

Should I use the camera() function (as shown here in section 7: https://www.bioconductor.org/packages/devel/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html#software-and-code-used) or the camera.DGEList() function (https://www.rdocumentation.org/packages/edgeR/versions/3.14.0/topics/camera.DGEList)?
At what point in the workflow given here (section 4.1, https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf#page44) and with which object should I use the function? The documentation for the camera.DGEList() function says that it takes a DGEList object containing dispersion estimates, but I wanted to verify that passing in the object “y” after calling estimateDisp() on it was the correct way to go to ensure that the paired design is taken into account.

Thank you!

ADD REPLY • link 4.6 years ago by bjeremy • 0

0

Entering edit mode

If you just use camera() it will autmatically use the camera.DGElist() if you supply a DGElist. 2: Just use camera on the lrt object in step 4.1.9.