Question: List of uniquely expressed genes
gravatar for Mozart
6 weeks ago by
Mozart200 wrote:

Dear all, I am wrapping my head around a way to define uniquely expressed genes regardless of any differential expression analysis approaches. Essentially, it has been asked to find some sort of 'method' that allowed me to define a certain gene as uniquely expressed in the dataset in such way that I could see if the latter is shared across other groups in the analysis.

To be more specific, these colleagues asked me to look at the normalised counts matrix (3 replicates and 4 groups) and find a way to look at each row (=gene) and set a fixed definition when a gene is deemed to be expressed vs not expressed.

Any ideas? Not sure whether an ANOVA test can be useful in this case?


rna-seq counts • 159 views
ADD COMMENTlink written 6 weeks ago by Mozart200
gravatar for rpolicastro
6 weeks ago by
rpolicastro1.7k wrote:

What question are they trying to answer with the data? There might be a more direct way of testing it. If you are getting reads mapping to a gene the gene is or was being expressed, so it becomes a somewhat subjective question to ask at what relative expression level do we begin to care about the expression.

ADD COMMENTlink written 6 weeks ago by rpolicastro1.7k

Thanks for your comment. They are trying to determine uniquely expressed genes that are taken from of a certain dataset. They want me to create a set per group containing gene that characterise the latter. Do you have any idea how to deal with this? Apparently, as far as I am aware a relative comparison like differential expression analysis won't exactly answer this question and yes, this is totally subjective, I presume.

ADD REPLYlink written 6 weeks ago by Mozart200

As implied, 'uniquely expressed' can only be qualified as a statement when accompanied with a threshold value. The logical cut-off to use would be Z > 1.96, as this would imply that a particular gene is statistically significantly expressed above the mean value (5% alpha). So, I would convert your dataset to Z-scores and then calculate mean Z-score per sample group.

If you were interested in quality management systems (I was / am), then go for Six Sigma (6σ), i.e., Z > 6.

ADD REPLYlink written 6 weeks ago by Kevin Blighe65k

Thanks a lot Kevin for this. Can I ask you if could explain why z score have to be bigger than 1.96 and what's its relationship with the mean value, please? To be fair I don't have an idea the rationale behind using z-scores? I presume we can use it because it returns standardised measurements? thanks a lot in advance

ADD REPLYlink written 6 weeks ago by Mozart200

Buenos días e Buon Giorno. The Z-score is a 'standardised' score that is readily-interpretable. A value of 1.96 is, on a 2-tailed distribution, equivalent to p = 0.05. Perhaps taking a look here will help:

So, if we find that ERBB2 (HER2) has a Z-score of 10 in a sub-group of our breast cancer patients (and is 0 - 0.5 in all others), we can infer that these patients are Her2-positive breast cancer patients and require Trastuzumab / Herceptin therapy.

Even by this definition, as you can see, using the term 'uniquely expressed' is difficult. Should it be that 'uniquely expressed' means that a gene has to have zero expression in one group?

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Kevin Blighe65k

Thank Kevin for your reply. Very helpful. My big concern here would be not only to define a parameter that define unique expression but also how to discriminate between different groups, if you get me. For example, what if a sub-group gets a z-score of 10, other two 5, and one 2? How can you set a common parameter from which you could draw useful conclusions?

ADD REPLYlink written 5 weeks ago by Mozart200
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1958 users visited in the last hour