I would like to know what exactly should be the set of genes that go as input in the expression matrix to GSVA. I am working particularly within the context of single-cell data, and have marker genes for two groups. I would like to find out the differential pathways between these two groups. First, I was thinking, since I have already identified a set of marker genes (about a 100 genes for each group), it would make sense to give the union of the two marker gene sets as the genes to include in the GSVA input. But now after reading more about how GSVA/GSEA work, I feel that the entire raw set of genes (about 14000 of them) should go as input so that the enrichment results would be stronger.
Is this true?
It would be great if somebody could explain what exactly should be the set of genes going into GSVA, is it better to give a restricted list or the entire list? Does "the more the merrier" apply here?
Thanks in advance for any responses!
P.S.: A while ago, I posted this related question, GSVA for single-cell marker genes but did not get any responses. So, I thought I will try again.