Question: Disease Free Survival Time & Event
gravatar for Saman
9.8 years ago by
U of Alberta
Saman250 wrote:

I am trying to extract a discrete label (0/1) for a classification (supervised learning) task based on two pieces of information available for each patient, dfs.t and dfs.e, in different cancer related studies. My main concern here is the way that researchers fill in the dfs.e column for patients, this is what I think:

  • dfs.e = 0: no relapse/recurrence/distant-metastasis within dfs.t time frame

  • dfs.e = 1: relapse/recurrence/distant metastasis/death-caused-by-cancer occurred at dfs.t

Is this interpretation right? I was wondering if there is any conventional way for dealing with data like this.

Thanks in advance,


disease classification • 3.9k views
ADD COMMENTlink modified 9.5 years ago by David Quigley11k • written 9.8 years ago by Saman250
gravatar for David Quigley
9.8 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

DFS usually means "disease-free survival" in a cancer context. The exact meaning can be tricky to pin down, though; some groups don't consider a local (or contralateral) recurrence a metastasis, so you have to read the paper carefully.

These data are often used for Kaplan-Meier analysis, where you need to know

  1. If the event occurred, when did it occur

  2. If the event did not occur, what is the last time-point for which I have follow-up?

Usually this is two columns (dfs.e and dfs.t or some other names), as you've described. There is also usually a third column with some distinguishing category (treated/untreated, predicted good outcome/predicted bad outcome, etc). The "survival" library in R is useful for this analysis.

ADD COMMENTlink written 9.8 years ago by David Quigley11k

I have seen survival package and used it for plotting KM graphs and running tests to compare survival times in two studies.

What I am interested in is to divide patients into two meaningful distinct groups based on dfs.t and dfs.e values. I cannot find anything useful in this regard!

ADD REPLYlink written 9.8 years ago by Saman250

From dfs.t and dfs.e you can extract "did/did not recur in a given time interval", which is a useful start. Selecting the time interval shouldn't be done casually; look at clinical papers or talk to a specialist to find out a meaningful interval. Finding the features to build your classifier is up to you...

ADD REPLYlink written 9.8 years ago by David Quigley11k

I agree with David that looking to clinical literature is a good idea. For some diseases 5yr or 10yr DFS is the metric that everyone cares/talks about. But, another idea is to plot just the frequency of events (where dfs.e=1) versus their time (dfs.t). Is there a linear accumulation of events over time? Or, is there a point at which the rate of accumulation of events changes. That could inform your choice of cutoff.

ADD REPLYlink written 8.6 years ago by Obi Griffith18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1006 users visited in the last hour