Question

How to plot KEGG gseaResult using metadata?

0

Entering edit mode

2.2 years ago

needHelpWkeggGSEA ▴ 10

Hello all!

I am very new to R and DEG analyses, so please bear with me...

I normalized a set of raw counts using DESeq2. To do this, I also fed in metadata which distinguished my samples by "Patient". In other words, the only factor which distinguished the columns (samples) of my raw counts matrix was the patient they were sourced from (n=5).

From there (after some preparatory filtering, sorting and data mining), I performed KEGG GSEA by employing clusterProfiler with the l2fc data obtained from results(dds, tidy = TRUE). In doing so, I have successfully returned a gseaResult object comprised of upregulated and downregulated KEGG pathways (provided by $NES).

HOWEVER, this gseaResult object does NOT provide gene-specific NES for each "Patient". Rather, a single NES value is reported for each gene. This is understandable because the l2fc input is not distinguished by sourced metadata, nor do the dds results distinguish l2fc by fed metadata.

My ultimate goal is to generate a heatmap of gene NES values representing EACH patient (n=5). I presume this is possible... but I am having trouble seeing it through... Any help is much appreciated!

Please let me know if I can provide any further clarifying information.

counts DESeq2 Raw • 979 views

ADD COMMENT • link 2.2 years ago by needHelpWkeggGSEA ▴ 10

score 1 · Answer 1 · 2022-03-16

1

Entering edit mode

2.2 years ago

jv ★ 1.8k

As you noted, GSEA and over-representation analysis is based on ranking of differential gene expression given some comparison between samples. I recommend GSVA (gene set variation analysis) for looking at pathway enrichment on a sample-by-sample basis.

https://www.bioconductor.org/packages/release/bioc/html/GSVA.html

ADD COMMENT • link 2.2 years ago by jv ★ 1.8k

0

Entering edit mode

Thank you... I will certainly check GSVA out.

On the topic of GSEA applied to a single, non-binary factor though, may I ask what exactly is the significance of +NES and -NES values?

To provide further clarity, the output of resultsNames(dds) is as follows:

[1] "Intercept" "Patient_Patient_1_vs_Patient_2" [3] "Patient_Patient_3_vs_Patient_2" "Patient_Patient_4_vs_Patient_2" [5] "Patient_Patient_5_vs_Patient_2"

I presume a significant KEGG pathway GSEA result with a positive NES value in this context means that its controlling gene set reported higher l2fc values... and this pathway is thereby deemed upregulated among the provided samples. However, this GSEA result says nada regarding whether say Patient 1 upregulates or downregulates X pathway. In my head, it does, however, suggest that the provided patient groups evidence differential regulation of the significant KEGG pathway GSEA results (because the contributing/controlling gene sets were previously deemed differentially expressed between these same patient groups by the preceding DESeq2 analysis).

The only thing I cannot wrap my head around is how to report an upregulated KEGG pathway GSEA result which represents input data from 5 "conditions"... is X pathway just reportedly upregulated more times than not in the sample population? If so, would one report along the lines of "X pathway reports mean positive NES values among the selected population BUT it is differentially enriched"?

Thank you, again, for any insights!

ADD REPLY • link 2.2 years ago by needHelpWkeggGSEA ▴ 10

1

Entering edit mode

One thing I didn't make clear in my answer is that GSVA is performed independent of differential gene expression results, i.e. you use your counts matrix as input, not DESeq2 results.

While I'm not sure I understand quite what you are asking, it does seems to me that one issue you are having is that there doesn't appear to be a "control" patient sample which can be used as a reference for comparing the gene expression of the 5 patient samples that you discuss here.

The only thing I cannot wrap my head around is how to report an upregulated KEGG pathway GSEA result which represents input data from 5 "conditions"... is X pathway just reportedly upregulated more times than not in the sample population?

Short answer: i don't think this is feasible given your sample meta/experimental design b/c of the lack of control sample. I believe GSVA combined with sample clustering will be more useful to you in trying to answer the above question than GSEA. The idea with GSVA is that you calculate pathway enrichment scores for each sample then you can ask questions like "which pathways are positively or negatively enriched in all 5 patient samples?".

Regarding "differential enrichment" using GSVA one can use limma methods to ask what pathways show a statistically significant increase or decrease in GSVA enrichment scores between two samples. An example is provided in the GSVA documentation.

ADD REPLY • link 2.2 years ago by jv ★ 1.8k

0

Entering edit mode

Ah… thank you very much for the clarification, it was much needed.

So, to iterate: feeding differential gene expression results into GSEA only makes sense if these DEGs were obtained against a control. Expanding on this, if I did have a control variable to compare against each non-control sample characteristic, could I expect my dds object to have a column of l2fc values for EACH characteristic-control comparison? In other words, say I had 5 patients with X tumor and 1 control patient, would my DEG results from DESeq contain 5 separate l2fc columns? (Which I could then feed into GSEA to obtain 5 distinct GSEA results)

I apologize if this too is a silly question!

ADD REPLY • link 2.2 years ago by needHelpWkeggGSEA ▴ 10

1

Entering edit mode

So, to iterate: feeding differential gene expression results into GSEA only makes sense if these DEGs were obtained against a control.

Not necessarily, it all depends on what questions you are trying to answer with this experiment. I only noted the idea of a control sample since it seems like you are interested in profiling the commonalities of these 5 samples as opposed to directly contrasting them in one-to-one differential expression analyses.

say I had 5 patients with X tumor and 1 control patient, would my DEG results from DESeq contain 5 separate l2fc columns? (Which I could then feed into GSEA to obtain 5 distinct GSEA results)

You would have 5 separate DESeq2 results tables, on for each patient OR if your 5 samples were replicates of a specific condition then you would just have 1 results containing the control vs treatment contrast.