Question: Normalizing transcriptome data by tissue type
gravatar for tamaraslosarek
2.6 years ago by
tamaraslosarek10 wrote:

Hi guys,

we are working on an university project where we want to find discriminating genes of different cancer types. For that we are using gene expression data of the TCGA dataset.

A naive approach would be to simply run some feature selection on tumor data of each type. However, we assume that this would not identify genes relevant for the tumor but the cell type itself. For example, we want to compare thyroid and lung cancer. Using only tumor data, we would expect that we find differentially expressed genes that are specific not for the tumor but for the original cell type itself. So we want to "normalize" thyroid tumor data with healthy thyroid tissue to find discriminating genes for thyroid tumor first that can now be compared with "normalized" genes for lung cancer.

We have some ideas how to do this ourselves but we suppose that this is not an uncommon task, so has anyone heard of this "normalization" approach and how it usually is done? We suppose that this needs to be done when clustering cancer types to see meaningful differences but we could not find this in the literature we read.

We hope we could state our problem in a comprehensible way, if not, feel free to ask. Thanks for your help!

ADD COMMENTlink modified 2.6 years ago by Kevin Blighe63k • written 2.6 years ago by tamaraslosarek10

What about this : Genetic effects on gene expression across human tissues

ADD REPLYlink written 2.6 years ago by Buffo1.8k

Can you block on tissue origin? For example, the same way you might incorporate a batch effect (~ batch + group) you instead incorporate tissue origin (~ tissue + tumour)

ADD REPLYlink written 2.6 years ago by James Ashmore3.0k
gravatar for Kevin Blighe
2.6 years ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Hey, you could try:

  1. Obtain healthy / normal tissue specific expression data from one or more online databases ( see here: How Can We Get Tissue Specific Genes? - also look up FANTOM5)
  2. Using these databases, determine which genes are specific to your tissues of interest (e.g. thyroid). To do this, just do something like converting the downloaded data (preferably it has a normal distribution) to Z-scores. Thus, 'tissue-specific' genes will have high Z-scores in the tissues in which they are most expressed; whereas, ubiquitously expressed genes will have low Z-scores

After that, when you conduct your differential expression comparison in tumours between thyroid and lung, you can just filter the list of differentially expressed genes for the genes encountered via the above 2-step approach) - this is the simplistic approach.

There should be a way, however, to actually 'normalise' your tumour expression data based on the results that you obtain from the tissue specific databases. For example, one could supply the healthy / normal tissue-specific data as priors to an empirical Bayesian regression model and then adjust your expression data based on these priors.

Also keep in mind that 'thyroid' and 'lung' refer to many different cell- and tissue-types.


ADD COMMENTlink modified 19 months ago • written 2.6 years ago by Kevin Blighe63k

Hi Kevin, thank you for the very helpful reply! Just two short (and hopefully short to answer) follow-up questions: Do you know literature where these or similar methods were used? And, keeping in mind that there are many different cell- and tissue types, would you agree with us that this normalization approach in general is a meaningful thing to do?

ADD REPLYlink written 2.5 years ago by tamaraslosarek10

Hello, sorry, no published literature behind this. However, it is part of work that is currently under peer review.

Yes, I believe you must correct / adjust for the different tissues in this situation.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Kevin Blighe63k

Thanks again, could you maybe drop us a link should you notice the work was approved and published?

ADD REPLYlink written 2.5 years ago by tamaraslosarek10

Sure, although, I post in a lot of threads here! It will be easy to forget. Maybe you could contact me at a later time via my email on GitHub.

ADD REPLYlink written 2.5 years ago by Kevin Blighe63k

Hi Kevin, "could you maybe drop us a link should you notice the work was approved and published?" could you share the link to the reference that has been mentioned in the above post?

ADD REPLYlink written 14 months ago by Natasha40

My former colleagues are dreadfully slow with it - I do not currently know the status. I have published results from other studies that even began in the time after I left that group.

ADD REPLYlink written 14 months ago by Kevin Blighe63k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1480 users visited in the last hour