Question: Normalizing transcriptome data by tissue type
1
gravatar for tamaraslosarek
19 months ago by
tamaraslosarek10 wrote:

Hi guys,

we are working on an university project where we want to find discriminating genes of different cancer types. For that we are using gene expression data of the TCGA dataset.

A naive approach would be to simply run some feature selection on tumor data of each type. However, we assume that this would not identify genes relevant for the tumor but the cell type itself. For example, we want to compare thyroid and lung cancer. Using only tumor data, we would expect that we find differentially expressed genes that are specific not for the tumor but for the original cell type itself. So we want to "normalize" thyroid tumor data with healthy thyroid tissue to find discriminating genes for thyroid tumor first that can now be compared with "normalized" genes for lung cancer.

We have some ideas how to do this ourselves but we suppose that this is not an uncommon task, so has anyone heard of this "normalization" approach and how it usually is done? We suppose that this needs to be done when clustering cancer types to see meaningful differences but we could not find this in the literature we read.

We hope we could state our problem in a comprehensible way, if not, feel free to ask. Thanks for your help!

ADD COMMENTlink modified 19 months ago by Kevin Blighe46k • written 19 months ago by tamaraslosarek10
1

What about this : Genetic effects on gene expression across human tissues

ADD REPLYlink written 19 months ago by Buffo1.6k
1

Can you block on tissue origin? For example, the same way you might incorporate a batch effect (~ batch + group) you instead incorporate tissue origin (~ tissue + tumour)

ADD REPLYlink written 19 months ago by James Ashmore2.6k
4
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe46k
Kevin Blighe46k wrote:

Hey, you could try:

  1. Obtain healthy / normal tissue specific expression data from one or more online databases ( see here: How Can We Get Tissue Specific Genes? - also look up FANTOM5)
  2. Using these databases, determine which genes are specific to your tissues of interest (e.g. thyroid). To do this, just do something like converting the downloaded data (preferably it has a normal distribution) to Z-scores. Thus, 'tissue-specific' genes will have high Z-scores in the tissues in which they are most expressed; whereas, ubiquitously expressed genes will have low Z-scores

After that, when you conduct your differential expression comparison in tumours between thyroid and lung, you can just filter the list of differentially expressed genes for the genes encountered via the above 2-step approach) - this is the simplistic approach.

There should be a way, however, to actually 'normalise' your tumour expression data based on the results that you obtain from the tissue specific databases. For example, one could supply the healthy / normal tissue-specific data as priors to an empirical Bayesian regression model and then adjust your expression data based on these priors.

Also keep in mind that 'thyroid' and 'lung' refer to many different cell- and tissue-types.

Kevin

ADD COMMENTlink modified 7 months ago • written 19 months ago by Kevin Blighe46k

Hi Kevin, thank you for the very helpful reply! Just two short (and hopefully short to answer) follow-up questions: Do you know literature where these or similar methods were used? And, keeping in mind that there are many different cell- and tissue types, would you agree with us that this normalization approach in general is a meaningful thing to do?

ADD REPLYlink written 18 months ago by tamaraslosarek10
1

Hello, sorry, no published literature behind this. However, it is part of work that is currently under peer review.

Yes, I believe you must correct / adjust for the different tissues in this situation.

ADD REPLYlink modified 18 months ago • written 18 months ago by Kevin Blighe46k

Thanks again, could you maybe drop us a link should you notice the work was approved and published?

ADD REPLYlink written 18 months ago by tamaraslosarek10

Sure, although, I post in a lot of threads here! It will be easy to forget. Maybe you could contact me at a later time via my email on GitHub.

ADD REPLYlink written 18 months ago by Kevin Blighe46k

Hi Kevin, "could you maybe drop us a link should you notice the work was approved and published?" could you share the link to the reference that has been mentioned in the above post?

ADD REPLYlink written 9 weeks ago by Natasha30

My former colleagues are dreadfully slow with it - I do not currently know the status. I have published results from other studies that even began in the time after I left that group.

ADD REPLYlink written 9 weeks ago by Kevin Blighe46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1504 users visited in the last hour