Question

List of uniquely expressed genes

0

Entering edit mode

3.7 years ago

Mozart ▴ 330

Dear all, I am wrapping my head around a way to define uniquely expressed genes regardless of any differential expression analysis approaches. Essentially, it has been asked to find some sort of 'method' that allowed me to define a certain gene as uniquely expressed in the dataset in such way that I could see if the latter is shared across other groups in the analysis.

To be more specific, these colleagues asked me to look at the normalised counts matrix (3 replicates and 4 groups) and find a way to look at each row (=gene) and set a fixed definition when a gene is deemed to be expressed vs not expressed.

Any ideas? Not sure whether an ANOVA test can be useful in this case?

Thanks

rna-seq RNA-Seq counts • 1.1k views

ADD COMMENT • link 3.7 years ago by Mozart ▴ 330

score 0 · Answer 1 · 2020-08-13

0

Entering edit mode

3.7 years ago

rpolicastro 13k

What question are they trying to answer with the data? There might be a more direct way of testing it. If you are getting reads mapping to a gene the gene is or was being expressed, so it becomes a somewhat subjective question to ask at what relative expression level do we begin to care about the expression.

ADD COMMENT • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Thanks for your comment. They are trying to determine uniquely expressed genes that are taken from of a certain dataset. They want me to create a set per group containing gene that characterise the latter. Do you have any idea how to deal with this? Apparently, as far as I am aware a relative comparison like differential expression analysis won't exactly answer this question and yes, this is totally subjective, I presume.

ADD REPLY • link 3.7 years ago by Mozart ▴ 330

0

Entering edit mode

As implied, 'uniquely expressed' can only be qualified as a statement when accompanied with a threshold value. The logical cut-off to use would be Z > 1.96, as this would imply that a particular gene is statistically significantly expressed above the mean value (5% alpha). So, I would convert your dataset to Z-scores and then calculate mean Z-score per sample group.

If you were interested in quality management systems (I was / am), then go for Six Sigma (6σ), i.e., Z > 6.

ADD REPLY • link 3.7 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks a lot Kevin for this. Can I ask you if could explain why z score have to be bigger than 1.96 and what's its relationship with the mean value, please? To be fair I don't have an idea the rationale behind using z-scores? I presume we can use it because it returns standardised measurements? thanks a lot in advance

ADD REPLY • link 3.7 years ago by Mozart ▴ 330

0

Entering edit mode

Buenos días e Buon Giorno. The Z-score is a 'standardised' score that is readily-interpretable. A value of 1.96 is, on a 2-tailed distribution, equivalent to p = 0.05. Perhaps taking a look here will help:

So, if we find that ERBB2 (HER2) has a Z-score of 10 in a sub-group of our breast cancer patients (and is 0 - 0.5 in all others), we can infer that these patients are Her2-positive breast cancer patients and require Trastuzumab / Herceptin therapy.

Even by this definition, as you can see, using the term 'uniquely expressed' is difficult. Should it be that 'uniquely expressed' means that a gene has to have zero expression in one group?

ADD REPLY • link 3.7 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank Kevin for your reply. Very helpful. My big concern here would be not only to define a parameter that define unique expression but also how to discriminate between different groups, if you get me. For example, what if a sub-group gets a z-score of 10, other two 5, and one 2? How can you set a common parameter from which you could draw useful conclusions?

ADD REPLY • link 3.7 years ago by Mozart ▴ 330