Hello Bio Stars,
I am trying to do a GSVA analysis to determine differential pathways between two genotypes in single-cell data. I think I have an overall understanding of how GSVA works, but some parts of the method are still vague to me. So, I thought I will post what I think makes sense to me, but would very much appreciate any thoughts/guidance/advice!
The software I am using for single-cell data analysis, Seurat, easily gives a list of marker genes for the clusters. After this step, if I wish to find out the differential pathways between two chosen clusters, what would be the best approach? I am thinking of using the top X (say X=100) marker genes, extract the gene expression data only for these genes and the corresponding cells, and then input that matrix to GSVA. The other thing that can potentially be done is to give the whole list of about 15K expressed genes instead of chosen markers as input to GSVA. This is the decision I am unable to make. My question is: in GSVA, does the list of genes you include in the input expression matrix have an impact on the null distribution? So, if I give it only the marker genes (about 200 genes in total), would the results be drastically different compared to the other scenario of giving the whole set of 15K expressed genes as input? Of course, as I understand, the computations are quite intensive and so there would be a huge difference in the computational cost between the two scenarios.
Thanks in advance for any responses!