Why survival plots look different with same data?
1
0
Entering edit mode
2.7 years ago
Biologist ▴ 200

Hello,

The survival plot based on Best separation of high and low expression samples of GPAM with Expression cutoff 23.6 FPKM looks like below (This plot is from Human Protein Atlas database)

Survival Plot between high and low samples of GPAM Expression

I took the GPAM FPKM data given in the above database and merged with Clinical data. Everything is stored in a dataframe df

  times bcr_patient_barcode patient.vital_status      FPKM
1   724        TCGA-2Y-A9GS                    1      30.3
2  1624        TCGA-2Y-A9GT                    1       5.6
3  1569        TCGA-2Y-A9GU                    0      26.6
4  2532        TCGA-2Y-A9GV                    1      18.4
5  1271        TCGA-2Y-A9GW                    1       4.7
6  2442        TCGA-2Y-A9GX                    0      19.4


I used survminer package for the cutpoint to divide low and high expression samples.

library(survminer)

surv_rnaseq.cut <- surv_cutpoint(
df,
time = "times",
event = "patient.vital_status",
variables = c("FPKM")
)

summary(surv_rnaseq.cut)
cutpoint statistic
GPAM_FPKM     23.6  2.834408


Then catogarization is done.

surv_rnaseq.cat <- surv_categorize(surv_rnaseq.cut)


Then to plot the data I did like below:

library(survival)
library(RTCGA)
fit <- survfit(Surv(times, patient.vital_status) ~ FPKM,
data = surv_rnaseq.cat)
pdf("Survival_high_vs_low.pdf", width = 10, height = 10)
ggsurvplot(
fit,                     # survfit object with calculated statistics.
risk.table = TRUE,       # show risk table.
pval = TRUE,             # show p-value of log-rank test.
conf.int = TRUE,         # show confidence intervals for
# point estimaes of survival curves.
xlim = c(0,3000),        # present narrower X axis, but not affect
# survival estimates.
break.x.by = 1000, # break X axis in time intervals by 500.
break.y.by = 0.1,
ggtheme = theme_RTCGA(), # customize plot and risk table with a theme.
risk.table.y.text.col = T, # colour risk table text annotations.
risk.table.y.text = FALSE # show bars instead of names in text annotations
# in legend of risk table
)
dev.off()


The Survival plot I got looks like this Suvival plot with my analysis. Basically I used the same data which they used in Human Protein Atlas database. But the plot with my analysis look different compared to the plot in the database.

What could be the reason for this? Kaplan Meier statistics?

Any help is appreciated.

RNA-Seq tcga survival fpkm survivalanalysis • 1.2k views
0
Entering edit mode
2.7 years ago

I have nothing of substance to contribute except that the actual details of the analysis matter since the Human Protein Atlas people themselves show that the same data can very well yield differently looking survival plots: https://www.proteinatlas.org/ENSG00000119927-GPAM/pathology/tissue/liver+cancer

Overall, the trend seems to be the same for your analysis and theirs, no? Do you know whether you used the same tools, settings and cut-offs as the HPA guys?

0
Entering edit mode

Yes, the trend looks same but in my plot I see after 2000 days there is down peak of high expression which I didn't observe in plot in HPA. I have used the same cutoff 23.6 which they have used. Don't know what is that small difference.

0
Entering edit mode

you have one sample less (247 instead of 248 for one group). Also: did you remove everything FPKM < 1?

0
Entering edit mode

Yes, I see that in my case I have one sample less. I guess it won't make much difference. In their analysis they removed Genes with FPKM < 1, In my case I'm looking at only single gene.