Question: Disease Free Survival Time & Event
2
gravatar for Saman
8.4 years ago by
Saman240
U of Alberta
Saman240 wrote:

I am trying to extract a discrete label (0/1) for a classification (supervised learning) task based on two pieces of information available for each patient, dfs.t and dfs.e, in different cancer related studies. My main concern here is the way that researchers fill in the dfs.e column for patients, this is what I think:

  • dfs.e = 0: no relapse/recurrence/distant-metastasis within dfs.t time frame

  • dfs.e = 1: relapse/recurrence/distant metastasis/death-caused-by-cancer occurred at dfs.t

Is this interpretation right? I was wondering if there is any conventional way for dealing with data like this.

Thanks in advance,

--Saman

disease classification • 3.3k views
ADD COMMENTlink modified 8.1 years ago by David Quigley11k • written 8.4 years ago by Saman240
2
gravatar for David Quigley
8.4 years ago by
David Quigley11k
San Francisco
David Quigley11k wrote:

DFS usually means "disease-free survival" in a cancer context. The exact meaning can be tricky to pin down, though; some groups don't consider a local (or contralateral) recurrence a metastasis, so you have to read the paper carefully.

These data are often used for Kaplan-Meier analysis, where you need to know

  1. If the event occurred, when did it occur

  2. If the event did not occur, what is the last time-point for which I have follow-up?

Usually this is two columns (dfs.e and dfs.t or some other names), as you've described. There is also usually a third column with some distinguishing category (treated/untreated, predicted good outcome/predicted bad outcome, etc). The "survival" library in R is useful for this analysis.

ADD COMMENTlink written 8.4 years ago by David Quigley11k

I have seen survival package and used it for plotting KM graphs and running tests to compare survival times in two studies.

What I am interested in is to divide patients into two meaningful distinct groups based on dfs.t and dfs.e values. I cannot find anything useful in this regard!

ADD REPLYlink written 8.4 years ago by Saman240

From dfs.t and dfs.e you can extract "did/did not recur in a given time interval", which is a useful start. Selecting the time interval shouldn't be done casually; look at clinical papers or talk to a specialist to find out a meaningful interval. Finding the features to build your classifier is up to you...

ADD REPLYlink written 8.4 years ago by David Quigley11k

I agree with David that looking to clinical literature is a good idea. For some diseases 5yr or 10yr DFS is the metric that everyone cares/talks about. But, another idea is to plot just the frequency of events (where dfs.e=1) versus their time (dfs.t). Is there a linear accumulation of events over time? Or, is there a point at which the rate of accumulation of events changes. That could inform your choice of cutoff.

ADD REPLYlink written 7.2 years ago by Obi Griffith17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1858 users visited in the last hour