Question: GSEA analysis error
gravatar for marco
4 weeks ago by
marco10 wrote:


I am trying to run GSEA on my RNA-seq dataset using the tool provided by the Broad Institute, which I have downloaded from their webpage. I am using as input files my expression dataset (including all the genes, not only DEGs) Gene Symbols as identifiers, followed by the normalized counts for the samples. In addition to the expression dataset, I have generated a phenotype label .csl file as required by the tool.

In the "gene set database" I have selected the databases from h to c6 including only the ones with the ".all" definition, in order to avoid duplications. Also, in the "permutation type" I have selected "gene_set".

When I try to run the GSEA analysis, I am uncertain what to select in the "Collapse" option. If I select "No_Collapse", then I get the following error message:

After pruning, none of the gene sets passed size thresholds.

If I instead select "Collapse", it requires me to select a "ChIP platform" and I am very confused about what to select. Using Gene Symbols as identifiers in my expression dataset, I have tried to select "Human_Symbol_with_Remapping_MSigDB.v7.1.chip", but I get the following error:

The collapsed dataset was empty when used with

Any help would be very appreciated!


rna-seq gsea • 152 views
ADD COMMENTlink modified 29 days ago by Danielle B10 • written 4 weeks ago by marco10

followed by the normalized counts for the samples

GSEA requires a ranked list, e.g. ranked by singificance. How did you generate this?

ADD REPLYlink written 4 weeks ago by ATpoint36k

I am pretty new to this, so I am not sure to understand exactly what you mean. I have uploaded my expression dataset where I have one column with all the gene symbols and other columns with the normalized counts for each sample.

ADD REPLYlink written 4 weeks ago by marco10
gravatar for Danielle B
29 days ago by
Danielle B10
Johns Hopkins University
Danielle B10 wrote:

Hi Marco, I ran into the same issues when I was doing this a few weeks ago too. In my .gct file, I ended up putting my gene EntrezIDs in my first column "Name", and then put the corresponding gene names in the "Description" column. This then allowed me to collapse my dataset in GSEA (even though I really didn't need to). I selected the Human_NCBI_Entrez_Gene_ID_MSigDB... option from the drop-down, since that was the best match to the information I put in the first "Name" column. I only selected one Gene Sets Database at a time, though not sure how relevant that is.

Also, in response to @ATpoints comment, you don't need to use a ranked gene list as your input. In fact, I think GSEA prefers to do the rankings itself, to me, that's part of it's magic! Let me know if this ends up helping!

Best, Danielle

ADD COMMENTlink written 29 days ago by Danielle B10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1747 users visited in the last hour