Entering edit mode
5.1 years ago
arronar ▴ 270
I'm trying to run GSVA, using microarray data and using 186 KEGG gene-sets. While I was waiting the output matrix to be 186(pathways)xsamples , is 6(pathways)xsamples. It returns only enrichment values for only 6 of the provided gene-sets and not for all of them. Is that something expected or am I doing something wrong with the code?
How many samples do you have on the original microarray data? If you had 6 samples then the expected output is 186(pathways provided)x6(samples)
I'm having 72 samples. and I'm getting back 6(pathways)x72(samples) instead of 1866(pathways)x72(samples)
Here is the command I'm using
gsva_res <- gsva( data, GSC.C2.msigdb$gsc[1:186], mx.diff=FALSE, verbose=TRUE)
and in the console it returns :
The gene set collection was built by
GSC.C2.msigdb <- loadGSC(file="geneSets/c2.all.v6.1.symbols.gmt", type="gmt")
Ok, are you sure you are using 186 Gene sets?? It seems like it is subseting by genes and not gene sets (subseting by number in a GeneSetCollection produces this, while subseting by name subsets by GeneSet name) or something alike. Where does the loadGSC function come from? Which class is the GSC.C2.msigdb$gsc?
Also I tried to provide it a plain list that created as
KEGG <- GSC.C2.msigdb$gsc[1:186]which
class(KEGG)is list and
length(KEGG)is 186, but still the same results.
And which type of list is it: one with genes: pathways or pathways genes? Load it as a GeneSetCollection and post here the output of printing it. I think that there are some gene sets with empty genes, or could be that the genes are not mapped to your matrix of probes/genes per sample
> GSC.C2.msigdbreturns :
And the rownames of the count matrix you are using are the same symbol ids?
Oh. my god. row.names are in lowercase while the gene-sets in uppercase. That is the cause of the problem.
I hope that while finding the solution to this problem you also learned how to track down your bugs and solve them. Good luck!
If you aren't applying any filter by pathway size it might be a bug. Post it in support.bioconductor.org or https://github.com/rcastelo/GSVA/issues
I'm not applying any filter
You may not be [applying filtering] but programs always perform filtering behind the scenes. You should review all possible parameters that can be used with
gsva()and then modify those that could be producing the observed effect.
From what I gather, if no statistically significant enrichment can be performed for a sample, then it will not be included.
gsvadoesn't have any filtering parameter, the problem is with the
Did not read anywhere in the manual that it performs such kind of filtering
Old thread but you can use the gmtPathways command from the fgsea package to get the list of pathways as follows:
then run for example: