Question

Using IMmuno-PREdictive Score (IMPRES)

0

Entering edit mode

10 months ago

JACKY ▴ 140

I have multiple bulk RNA-seq datasets, all of which have been normalized to TPM. However, one or two of these datasets have undergone different normalization techniques. To standardize the scale across all datasets, I performed upper quartile normalization (UQN) followed by a log2 transformation.

I am now looking to integrate these datasets to analyze them using the IMPRES algorithm, which predicts responses to immunotherapy. However, I have some reservations.

IMPRES operates by comparing the expression levels of checkpoint genes, and accordingly assigns scores to each sample. I am uncertain about the most effective method to normalize the data prior to employing the IMPRES algorithm.

Would the application of a log2 transformation drastically alter this process? Additionally, what impact would UQN have?

Another aspect I am contemplating is whether it is more beneficial to analyze each dataset independently within the IMPRES algorithm, or should I first combine them, correct any batch effects that may arise due to different sources, and then proceed with the IMPRES analysis? Your help is much needed. Thank you.

scaling TPM cancer normalization r • 737 views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 10 months ago by JACKY ▴ 140

0

Entering edit mode

You UQ normalized TPM values across datasets?

ADD REPLY • link 10 months ago by Ram 43k

0

Entering edit mode

Yes, I have applied UQ normalization to each dataset individually. This was necessitated by the fact that some of the more extensive datasets that I have collected underwent additional normalization processes beyond TPM, which included batch effect corrections, among other things. (I don't have the access to the counts data unfortunately).

Consequently, it was important to harmonize the scale across all datasets. Furthermore, I implemented a log2 transformation; however, I still have some reservations regarding the efficacy of this step, especially in the context of the IMPRES algorithm.

ADD REPLY • link 10 months ago by JACKY ▴ 140

0

Entering edit mode

The log2 doesn't matter here, it only serves to transform the range of values, it doesn't affect the compatibility IMO.

TPM is not comparable across samples, and applying additional transformations only tortures the data further. Is it possible to email the sources and request count data? That would be the best way to deal with this. TCGA uses FPKM-UQ but it's all buyer beware and I am highly skeptical of these impossible-burger beyond-meat ultra-processed metrics

ADD REPLY • link 10 months ago by Ram 43k

0

Entering edit mode

I understand... but sadly, I can't ask for access to count data. I'm not using the gene expression data by itself; instead, I'm putting it into computer programs like CIBERSORT, IMPRES, TIDE, and others. Do you think using TPM with UQ normalization might not work well for this?

By the way, the TIDE and xCell programs need the data to be TPM normalized. It says that in their guide. I'm not sure about the others.

ADD REPLY • link 10 months ago by JACKY ▴ 140

0

Entering edit mode

By the way, the TIDE and xCell programs need the data to be TPM normalized. It says that in their guide. I'm not sure about the others.

If they expect that, you should be fine with TPM, but adding further "normalization" is not recommended. I've never used TIDE or xCell or Cibersort, but I think you should read through their documentation/paper and ensure they're fine with data containing batch effects

ADD REPLY • link 10 months ago by Ram 43k