Question: Basic idea of applying a gene signature
gravatar for Avro
5.1 years ago by
Avro140 wrote:

Hi everyone,

I am a PhD student in biochemistry, and I am learning about gene expression signature. My lab generated a 36-gene mouse signature. These genes are all highly expressed. I am interested in identifying "mouse-like" human samples from a large set of primary breast tumors.

I was wondering if someone could please give me the general guidelines on how to apply a gene signature. I can write code, but don't understand the principles (I am reading tough). Is it based on the gene names and their fold-change or just the names? I am sorry for asking such a basic question, but I am learning this aspect of bioinformatics. I read that a naive Bayes classifier is a good idea? Alternatively, ranking the samples (based on how well they express the signature) and using bootstrap resampling? 

I would also greatly appreciate to be redirected to a former post or tutorial.

Thank you!





gene signature • 1.9k views
ADD COMMENTlink modified 5.1 years ago by Devon Ryan93k • written 5.1 years ago by Avro140
gravatar for Devon Ryan
5.1 years ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:

One possibility (that wouldn't even require writing much code) is to use this signature as a gene set and use GSEA on the human samples to look for samples in which that set is more highly expressed than expected. The general idea is to perform GSEA on a large number of samples, many of which you expect to not show enrichment, and then look at the resulting enrichment score (or p-value) distribution. From that, you should be able to get an idea of whether the expression of this set generally follows a normal distribution or whether there's a bimodal distribution...meaning that there's a subset of samples that you're going to be very interested in. You could alternatively use resampling there, though I think it'll be quicker and easier to just have a look at the distributions first (nothing is preventing you from doing both).

That's one fairly straight forward possibility, though there are others.

ADD COMMENTlink written 5.1 years ago by Devon Ryan93k

Thank you! I have just started looking at GSEA's documentation. If I have normalized human Illumina HT-12 v3 gene expression (breast tumor vs normal), and a list of the 36 genes, I should be able to run GSEA, right? I am asking because GSEA can be run differently. Thank you once again for your help.

ADD REPLYlink written 5.1 years ago by Avro140

Yup, that should work!

ADD REPLYlink written 5.1 years ago by Devon Ryan93k

Hi, I am a first-year master student in bioinformatics with a bachelor in molecular biology.
I have a question that seems somewhat relevant to the one that was asked here. I have analysed Chip-seq data for 100 transcription factors (TF) of C.elegans by calling targets to each of these factors. Now I have a table with 40k rows (all genes in C.Elegans) and 100 columns (all available TFs), each cell contains a score that reflect how likely given factor affects given gene, so for each TF I have a ranked list of genes. Beside this table I also have ten gene sets of different sizes (from 100 to 1000 genes). It is maybe important to mention that there is no overlap between these gene sets.

The question I seek to answer is which TF is most likely regulate each of the gene sets.  I've realized that I can use GSEA here but I can not figure how exactly it should be applied in this case.  Maybe I can use some other implementation of Random Walk?

​I will appreciate any suggestions and ideas.

Thanks in advance,




ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Tim Padvitski0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 930 users visited in the last hour