I am having difficulty interpreting error 1001 (After pruning, none of the gene sets passed size thresholds) when using the GSEA 3.0 desktop app.
My data consists of DESeq2 analyzed polyA-enriched RNA-seq reads. I ranked my data by a significance score that combines padj-values and fold changes (https://www.ncbi.nlm.nih.gov/pubmed/22321699). Utilizing the HGNC biomart, I converted the Ensemble IDs in my data to human gene symbols (e.g. APOE) so that gene IDs are consistent with the nomenclature of the MSigDB. This data table consisting of gene symbols (column 1) and ranked significance scores (column 2) was exported into a tab-delimited ".rnk" file. I then ran these files with the GSEAPreranked tools for various gene sets from MSigDB (h.all.v6.2 to c7.all.v6.2); however, I keep getting error message insisting that none of my gene sets passed the size thresholds. Here are the parameters of my GSEA analysis:
Number of permutations: 1000 Enrichment statistic: classic Max size: 5000 Min size: 15 (I've tried lowering this to 5 all the way to 0 and I still get the same error) Normalization mode: meandiv
Looking at older threads many people suggested to set "FALSE" to collapsing the parameter data but from my understanding this option was a feature of older GSEA versions and this is option is now removed or defaulted to "FALSE". I also tried lowering the min size of my gene sets sequentially all the way to 0 and I still fail to get any sort of data.
My questions are the following:
Is there something wrong with my analysis process or is this perhaps a "real" biological result and there are no enriched gene sets? I am skeptical that there are no enriched gene sets because GO enrichment analyses with DAVID or Funcassociate 3.0 show dozens of enriched categories.
Should genes with both increasing and decreasing expression be included in the same ".rnk" file? The significance score I used to rank these genes are directional meaning that increasing and decreasing genes have positive and negative scores respectively.
Thank you in advance!!
I think you have to use your complete gene list for your gsea analysis (all genes even not differentially expressed ones). Maybe you can add the start and end of your ranked list, with length of the table - to see the format and numbers.
I just started to use GSEA and as far as I understand is that there is a difference between the setting of Run GSEAPreranked and RunGSEA
In the first one there is not selection of FALSE for collapsing data unlike the second option, those two are two different type of analysis
Collapsing is necessary to associate the gene symbols of your list to the probes of the chip platform