Question: how to use ESTIMATE to infer tumor purity and stromal score from RNA-seq data?
0
gravatar for lhaiyan3
2.1 years ago by
lhaiyan330
United States
lhaiyan330 wrote:

Dear all:

Did anyone use ESTIMATE (http://bioinformatics.mdanderson.org/main/ESTIMATE:Overview) to infer tumor purity and stromal score from RNA-seq before? I am not clear how to use this tool and what is the input file format for this tool? They are just several steps, and i did not figure out how to load my own data to run the program? Thanks very much for your great help.

OvarianCancerExpr <- system.file("extdata", "sample_input.txt", package="estimate") filterCommonGenes(input.f=OvarianCancerExpr, output.f="OV_10412genes.gct", id="GeneSymbol") estimateScore("OV_10412genes.gct", "OV_estimate_score.gct", platform="affymetrix") plotPurity(scores="OV_estimate_score.gct", samples="s516", platform="affymetrix")

best

Haiyan Lei

rna-seq • 3.3k views
ADD COMMENTlink modified 21 months ago by sina.nassiri60 • written 2.1 years ago by lhaiyan330
1
gravatar for sina.nassiri
21 months ago by
sina.nassiri60
Switzerland/Lausanne
sina.nassiri60 wrote:

The ESTIMATE algorithm (Yoshihara et al. 2013 Nature Communications) is comprised of two steps. In the first step, an enrichment score is calculated using single-sample GSEA (Barbie et al. 2009 Nature). Note that although immune cells are essentially part of the stroma, Yoshihara et al. calculated two enrichment scores. One based on immune-related genes, which they referred to as "immune" score. The other score was calculated based on non-immune genes, which they referred to as "stromal" score. The final ESTIMATE score is the sum of immune and stromal enrichment scores. In the second step, the ESTIMATE enrichment score is converted to tumor purity using the following formula:

Tumour purity = cos (0.6049872018 + 0.0001467884 􏰀 x ESTIMATE score)

where "Tumor purity" represents ABSOLUTE-based tumor purity (ABSOLUTE is another algorithm that computes tumor purity based on somatic DNA copy number alterations), and "ESTIMATE score" represents ESTIMATE enrichment score obtained from TCGA Affymetrix data, as explained above. The key point is that this calibration formula was derived using only Affymetrix data, and therefore cannot be used to convert RNAseq-based ESTIMATE score to tumor purity. That being said, you may still apply the single-sample GSEA algorithm to properly normalized RNAseq data to obtain ESTIMATE enrichment scores, and incorporate them as covariate in your downstream analysis to account for tumor purity.

ADD COMMENTlink modified 21 months ago • written 21 months ago by sina.nassiri60

This does not answer the question.

ADD REPLYlink written 19 months ago by friendshipweekpoem0

"The key point is that this calibration formula was derived using only Affymetrix data, and therefore cannot be used to convert RNAseq-based ESTIMATE score to tumor purity" ... How does this not answer the question?

ADD REPLYlink written 18 months ago by sina.nassiri60

I think you can definitely use ESTIMATE with RNA-seq data as this was done by the authors themselves. See the tool's website.

ADD REPLYlink written 6 months ago by Martombo2.6k
1

First of all, "as this was done by X" is rarely the right approach to verify assumptions of a computational algorithm. Second of all, ESTIMATE is published and the R code is publicly available for anyone to review. The ESTIMATE R package by default only accepts "affymetrix", "agilent", or "illumina" microarray data as input. Can you feed normalized RNAseq data as input to ESTIMATE? You surely can! ESTIMATE uses single sample GSEA to compute immune and stromal scores; it then adds them up to get ESTIMATE score which one can use for downstream analyses. In fact, this is what is provided on their website for TCGA RNAseq data. However, you can’t apply these scores to their formula to calculate tumor purity as this formula was derived specifically for microarray data.

ADD REPLYlink written 5 months ago by sina.nassiri60

I vaguely remember the opnion that statistical method developed from array data is not suited on RNA-Seq and this has something to do with the nature of RNA-Seq being zero-sum game (total reads sequenced is fixed). But I could not remember the details. Can you explain this a bit in details? Thanks

ADD REPLYlink written 22 days ago by CY460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1557 users visited in the last hour