Question: Why survival plots look different with same data?
gravatar for Biologist
2.4 years ago by
Biologist190 wrote:


The survival plot based on Best separation of high and low expression samples of GPAM with Expression cutoff 23.6 FPKM looks like below (This plot is from Human Protein Atlas database)

Survival Plot between high and low samples of GPAM Expression

I took the GPAM FPKM data given in the above database and merged with Clinical data. Everything is stored in a dataframe df


  times bcr_patient_barcode patient.vital_status      FPKM
1   724        TCGA-2Y-A9GS                    1      30.3
2  1624        TCGA-2Y-A9GT                    1       5.6
3  1569        TCGA-2Y-A9GU                    0      26.6
4  2532        TCGA-2Y-A9GV                    1      18.4
5  1271        TCGA-2Y-A9GW                    1       4.7
6  2442        TCGA-2Y-A9GX                    0      19.4

I used survminer package for the cutpoint to divide low and high expression samples.


surv_rnaseq.cut <- surv_cutpoint(
  time = "times",
  event = "patient.vital_status",
  variables = c("FPKM")

          cutpoint statistic
GPAM_FPKM     23.6  2.834408

Then catogarization is done. <- surv_categorize(surv_rnaseq.cut)

Then to plot the data I did like below:

fit <- survfit(Surv(times, patient.vital_status) ~ FPKM,
                data =
pdf("Survival_high_vs_low.pdf", width = 10, height = 10)
  fit,                     # survfit object with calculated statistics.
  risk.table = TRUE,       # show risk table.
  pval = TRUE,             # show p-value of log-rank test. = TRUE,         # show confidence intervals for 
  # point estimaes of survival curves.
  xlim = c(0,3000),        # present narrower X axis, but not affect
  # survival estimates. = 1000, # break X axis in time intervals by 500. = 0.1,
  ggtheme = theme_RTCGA(), # customize plot and risk table with a theme.
  risk.table.y.text.col = T, # colour risk table text annotations.
  risk.table.y.text = FALSE # show bars instead of names in text annotations
  # in legend of risk table

The Survival plot I got looks like this Suvival plot with my analysis. Basically I used the same data which they used in Human Protein Atlas database. But the plot with my analysis look different compared to the plot in the database.

What could be the reason for this? Kaplan Meier statistics?

Any help is appreciated.

ADD COMMENTlink modified 2.4 years ago by Friederike6.7k • written 2.4 years ago by Biologist190
gravatar for Friederike
2.4 years ago by
United States
Friederike6.7k wrote:

I have nothing of substance to contribute except that the actual details of the analysis matter since the Human Protein Atlas people themselves show that the same data can very well yield differently looking survival plots:

Overall, the trend seems to be the same for your analysis and theirs, no? Do you know whether you used the same tools, settings and cut-offs as the HPA guys?

ADD COMMENTlink written 2.4 years ago by Friederike6.7k

Yes, the trend looks same but in my plot I see after 2000 days there is down peak of high expression which I didn't observe in plot in HPA. I have used the same cutoff 23.6 which they have used. Don't know what is that small difference.

ADD REPLYlink written 2.4 years ago by Biologist190

you have one sample less (247 instead of 248 for one group). Also: did you remove everything FPKM < 1?

ADD REPLYlink written 2.4 years ago by Friederike6.7k

Yes, I see that in my case I have one sample less. I guess it won't make much difference. In their analysis they removed Genes with FPKM < 1, In my case I'm looking at only single gene.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Biologist190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 993 users visited in the last hour