Question: GSEA background list
gravatar for insertname
5.1 years ago by
insertname0 wrote:

I am having trouble understanding the GSEA workflow

Say you have a VCF file with variations and would like to determine if a gene set associated to some disease is enriched in your gene list.

Would it make sense to define the gene list as genes found to match variations associated to the disease, and the background as all of the variations?

Is that even a valid application? 

I am new to bioinformatics, so sorry if my question seems obvious 

Thank you :)

gsea • 1.9k views
ADD COMMENTlink modified 5.1 years ago by ethan.kaufman360 • written 5.1 years ago by insertname0
gravatar for ethan.kaufman
5.1 years ago by
ethan.kaufman360 wrote:

What you're proposing doesn't sound like GSEA.  Without going into too much detail, GSEA refers to software first published here, and requires a gene list that is comprehensive (a superset of any gene set that you may want to test) and ordered by some numerical variable (usually fold change in expression between two states).

In your case, your gene list is not ordered (a set of mutated genes) and not comprehensive (not every gene will have a mutation).  Your question is simply whether some gene set (say, genes with OMIM label "breast cancer"), is enriched in your "list" of mutated genes.  Well, this can be answered with a straightforward statistical test: Fisher's Exact Test, which can be done with a calculator, or using excel, R, or with an online tool like DAVID.  This test requires a "background list" which is the list of genes you had the potential to find a mutation in.  The simplest background list would just be all genes in the genome.  However, the background should be carefully considered before applying the test because it is often the source of mistakes.  For example, were all genes sequenced to sufficient depth that a mutation would have been found had one existed?  If not, then those genes should be excluded from the list.

The purpose of the test is to compare the gene list and the background list for the fraction of genes in the list that are in the gene set.  If enrichment is present, then the fraction should be higher for the gene list than for the background list.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by ethan.kaufman360
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 847 users visited in the last hour