Question: Why GSVA returns a matrix with fewer gene-sets?
0
gravatar for arronar
2.1 years ago by
arronar200
Austria
arronar200 wrote:

Hi.

I'm trying to run GSVA, using microarray data and using 186 KEGG gene-sets. While I was waiting the output matrix to be 186(pathways)xsamples , is 6(pathways)xsamples. It returns only enrichment values for only 6 of the provided gene-sets and not for all of them. Is that something expected or am I doing something wrong with the code?

Thanks.

microarray gsva • 993 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by arronar200

How many samples do you have on the original microarray data? If you had 6 samples then the expected output is 186(pathways provided)x6(samples)

ADD REPLYlink written 2.1 years ago by Lluís R.890

I'm having 72 samples. and I'm getting back 6(pathways)x72(samples) instead of 1866(pathways)x72(samples)

Here is the command I'm using

gsva_res <- gsva( data, GSC.C2.msigdb$gsc[1:186], mx.diff=FALSE, verbose=TRUE)

and in the console it returns :

Estimating GSVA scores for 6 gene sets.

Computing observed enrichment scores

Estimating ECDFs in microarray data with Gaussian kernels

The gene set collection was built by

GSC.C2.msigdb <- loadGSC(file="geneSets/c2.all.v6.1.symbols.gmt", type="gmt")

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by arronar200
1

Ok, are you sure you are using 186 Gene sets?? It seems like it is subseting by genes and not gene sets (subseting by number in a GeneSetCollection produces this, while subseting by name subsets by GeneSet name) or something alike. Where does the loadGSC function come from? Which class is the GSC.C2.msigdb$gsc?

ADD REPLYlink written 2.1 years ago by Lluís R.890

loadGSC is from piano package. and class(GSC.C2.msigdb$gsc) returns "list"

Also I tried to provide it a plain list that created as KEGG <- GSC.C2.msigdb$gsc[1:186] which class(KEGG) is list and length(KEGG) is 186, but still the same results.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by arronar200
1

And which type of list is it: one with genes: pathways or pathways genes? Load it as a GeneSetCollection and post here the output of printing it. I think that there are some gene sets with empty genes, or could be that the genes are not mapped to your matrix of probes/genes per sample

ADD REPLYlink written 2.1 years ago by Lluís R.890

> GSC.C2.msigdb returns :

First 50 (out of 4738) gene set names:
 [1] "KEGG_GLYCOLYSIS..." "KEGG_CITRATE_CY..." "KEGG_PENTOSE_PH..." "KEGG_PENTOSE_AN..." "KEGG_FRUCTOSE_A..."
 [6] "KEGG_GALACTOSE_..." "KEGG_ASCORBATE_..." "KEGG_FATTY_ACID..." "KEGG_STEROID_BI..." "KEGG_PRIMARY_BI..."
[11] "KEGG_STEROID_HO..." "KEGG_OXIDATIVE_..." "KEGG_PURINE_MET..." "KEGG_PYRIMIDINE..." "KEGG_ALANINE_AS..."
[16] "KEGG_GLYCINE_SE..." "KEGG_CYSTEINE_A..." "KEGG_VALINE_LEU..." "KEGG_VALINE_LEU..." "KEGG_LYSINE_DEG..."
[21] "KEGG_ARGININE_A..." "KEGG_HISTIDINE_..." "KEGG_TYROSINE_M..." "KEGG_PHENYLALAN..." "KEGG_TRYPTOPHAN..."
[26] "KEGG_BETA_ALANI..." "KEGG_TAURINE_AN..." "KEGG_SELENOAMIN..." "KEGG_GLUTATHION..." "KEGG_STARCH_AND..."
[31] "KEGG_N_GLYCAN_B..." "KEGG_OTHER_GLYC..." "KEGG_O_GLYCAN_B..." "KEGG_AMINO_SUGA..." "KEGG_GLYCOSAMIN..."
[36] "KEGG_GLYCOSAMIN..." "KEGG_GLYCOSAMIN..." "KEGG_GLYCOSAMIN..." "KEGG_GLYCEROLIP..." "KEGG_INOSITOL_P..."
[41] "KEGG_GLYCOSYLPH..." "KEGG_GLYCEROPHO..." "KEGG_ETHER_LIPI..." "KEGG_ARACHIDONI..." "KEGG_LINOLEIC_A..."
[46] "KEGG_ALPHA_LINO..." "KEGG_SPHINGOLIP..." "KEGG_GLYCOSPHIN..." "KEGG_GLYCOSPHIN..." "KEGG_GLYCOSPHIN..."

First 50 (out of 21095) gene names:
 [1] "ACSS2"   "GCK"     "PGK2"    "PGK1"    "PDHB"    "PDHA1"   "PDHA2"   "PGM2"    "TPI1"    "ACSS1"   "FBP1"   
[12] "ADH1B"   "HK2"     "ADH1C"   "HK1"     "HK3"     "ADH4"    "PGAM2"   "ADH5"    "PGAM1"   "ADH1A"   "ALDOC"  
[23] "ALDH7A1" "LDHAL6B" "PKLR"    "LDHAL6A" "ENO1"    "PKM2"    "PFKP"    "BPGM"    "PCK2"    "PCK1"    "ALDH1B1"
[34] "ALDH2"   "ALDH3A1" "AKR1A1"  "FBP2"    "PFKM"    "PFKL"    "LDHC"    "GAPDH"   "ENO3"    "ENO2"    "PGAM4"  
[45] "ADH7"    "ADH6"    "LDHB"    "ALDH1A3" "ALDH3B1" "ALDH3B2"

Gene set size summary:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   5.00   18.00   39.00   93.86   88.00 1972.00 

First 10 gene sets with additional info:
      Gene set                  Additional info                                                  
 [1,] "KEGG_GLYCOLYSIS_GLUC..." "http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_GLYCOLY..."
 [2,] "KEGG_CITRATE_CYCLE_T..." "http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_CITRATE..."
 [3,] "KEGG_PENTOSE_PHOSPHA..." "http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_PENTOSE..."
 [4,] "KEGG_PENTOSE_AND_GLU..." "http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_PENTOSE..."
 [5,] "KEGG_FRUCTOSE_AND_MA..." "http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_FRUCTOS..."
 [6,] "KEGG_GALACTOSE_METAB..." "http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_GALACTO..."
 [7,] "KEGG_ASCORBATE_AND_A..." "http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_ASCORBA..."
 [8,] "KEGG_FATTY_ACID_META..." "http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_FATTY_A..."
 [9,] "KEGG_STEROID_BIOSYNT..." "http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_STEROID..."
[10,] "KEGG_PRIMARY_BILE_AC..." "http://www.broadinstitute.org/gsea/msigdb/cards/KEGG_PRIMARY..."

while > GSC.C2.msigdb$gsc

$KEGG_GLYCOLYSIS_GLUCONEOGENESIS
 [1] "ACSS2"   "GCK"     "PGK2"    "PGK1"    "PDHB"    "PDHA1"   "PDHA2"   "PGM2"    "TPI1"    "ACSS1"   "FBP1"   
[12] "ADH1B"   "HK2"     "ADH1C"   "HK1"     "HK3"     "ADH4"    "PGAM2"   "ADH5"    "PGAM1"   "ADH1A"   "ALDOC"  
[23] "ALDH7A1" "LDHAL6B" "PKLR"    "LDHAL6A" "ENO1"    "PKM2"    "PFKP"    "BPGM"    "PCK2"    "PCK1"    "ALDH1B1"
[34] "ALDH2"   "ALDH3A1" "AKR1A1"  "FBP2"    "PFKM"    "PFKL"    "LDHC"    "GAPDH"   "ENO3"    "ENO2"    "PGAM4"  
[45] "ADH7"    "ADH6"    "LDHB"    "ALDH1A3" "ALDH3B1" "ALDH3B2" "ALDH9A1" "ALDH3A2" "GALM"    "ALDOA"   "DLD"    
[56] "DLAT"    "ALDOB"   "G6PC2"   "LDHA"    "G6PC"    "PGM1"    "GPI"    

$KEGG_CITRATE_CYCLE_TCA_CYCLE
 [1] "IDH3B"     "DLST"      "PCK2"      "CS"        "PDHB"      "PCK1"      "PDHA1"     "LOC642502" "PDHA2"    
[10] "LOC283398" "FH"        "SDHD"      "OGDH"      "SDHB"      "IDH3A"     "SDHC"      "IDH2"      "IDH1"     
[19] "ACO1"      "ACLY"      "MDH2"      "DLD"       "MDH1"      "DLAT"      "OGDHL"     "PC"        "SDHA"     
[28] "SUCLG1"    "SUCLA2"    "SUCLG2"    "IDH3G"     "ACO2"   
...
...
...
ADD REPLYlink written 2.1 years ago by arronar200
2

And the rownames of the count matrix you are using are the same symbol ids?

ADD REPLYlink written 2.1 years ago by Lluís R.890
1

Oh. my god. row.names are in lowercase while the gene-sets in uppercase. That is the cause of the problem.

ADD REPLYlink written 2.1 years ago by arronar200
1

I hope that while finding the solution to this problem you also learned how to track down your bugs and solve them. Good luck!

ADD REPLYlink written 2.1 years ago by Lluís R.890

If you aren't applying any filter by pathway size it might be a bug. Post it in support.bioconductor.org or https://github.com/rcastelo/GSVA/issues

ADD REPLYlink written 2.1 years ago by Lluís R.890

I'm not applying any filter

ADD REPLYlink written 2.1 years ago by arronar200

You may not be [applying filtering] but programs always perform filtering behind the scenes. You should review all possible parameters that can be used with gsva() and then modify those that could be producing the observed effect.

From what I gather, if no statistically significant enrichment can be performed for a sample, then it will not be included.

ADD REPLYlink written 2.1 years ago by Kevin Blighe56k

No, gsva doesn't have any filtering parameter, the problem is with the [1:186] part

ADD REPLYlink written 2.1 years ago by Lluís R.890

Did not read anywhere in the manual that it performs such kind of filtering

ADD REPLYlink written 2.1 years ago by arronar200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 981 users visited in the last hour