Hi all--long-time lurker, first time poster. I'm a grad student trying to replicate a published analysis of TCGA data (found here) and I had a dumb/newbie question on hazard ratios and differential expression, which then translates into a larger did-I-screw-this-up question regarding my own analysis.

In Figure 2, the authors give hazard ratios for a univariate analysis of gene expression related to survival with HR>1 indicating higher expression associated with better survival. They state that the Cox model was run on gene expression and time to recurrence, which I would have thought means that HR>1 actually suggests worse survival with increased expression, contrary to the annotation on the forest plot. Am I off base there?

I'm asking in part because I'm very close to matching their Kaplan-Meier curves for survival (Figure 3), but in my analysis the curves for high/low expression are flipped relative to the authors' such that high expression leads to worse survival. This would make sense to me if, in fact, HR>1 should mean a higher chance for recurrence with increased expression. I should note that while the survival curves are flipped in my own analysis, I've been able to somewhat closely replicate the HRs (at least in terms of >1 or <1) by using FKPM ~ days to recurrence for each candidate gene as the authors seem to have done.

To this amateur's eyes it seems like either I or the authors have something backwards. I've looked over my inputs to the graphs/models and I can't find any obvious errors in evaluating time to recurrence or high/low expression.

I'm not alleging malfeasance on the part of the authors (or malpractice by the peer reviewers)--my intuition is that I've goofed somewhere but I've spent enough time looking at this without success that I'm turning to the internet for help. Does it seem like I missed something with respect to the Cox model/hazard ratios? If so I can go back and triple check my code for the KM curves. I'm hoping to use TCGA data in some future work and if I've got it all turned around and backwards at this point I'd like to know! My goal in replicating this was mostly just to get a feel for working with the TCGA data.

