Survival Differences by Tumor Stage in Breast Cancer

A Cox Proportional Hazards Analysis Using Real-World Data

Victoria Nguyen & Julaxis Love (Advisor: Dr. Cohen)

2026-04-21

What is Survival Analysis?

  • Models time‑to‑event outcomes (e.g., death, recurrence, progression)
  • Handles censored data, where the event is not observed for all patients
  • Necessary when follow‑up times vary, as in many clinical studies
  • Traditional regression fails because it cannot properly account for censoring

Key Methods: Kaplan–Meier & Cox Model

Kaplan–Meier Estimator (1958)

  • Non‑parametric survival probability estimator

  • Useful for describing survival curves and comparing groups

  • Limitation: Cannot include multiple covariates

Cox Proportional Hazards Model (1972)

  • Semi‑parametric regression model

  • Relates covariates to the hazard of an event

  • Does not require specifying the baseline hazard

  • Widely used due to flexibility and interpretability

Importance of the Cox Model in Breast Cancer Research

  • Breast cancer studies often involve variable follow‑up times and censored observations, making survival analysis essential

  • Cox model allows researchers to examine how clinical, demographic, and biological factors influence risk over time

  • Common predictors incorporated into the model:

    • Tumor stage and grade

    • Hormone receptor status (ER/PR)

    • Treatment type (surgery, chemotherapy, radiation, hormonal therapy)

    • Patient age and other demographic factors

Importance of the Cox Model in Breast Cancer Research Continued

  • Provides adjusted hazard estimates, enabling comparison of multiple predictors simultaneously

  • Helps identify prognostic factors associated with recurrence or mortality

  • Supports clinical decision‑making by clarifying which variables meaningfully affect patient outcomes

Methods

Cox Proportional Hazards Model

\[h(t∣X)=h0(t)exp⁡(β1X1+β2X2+⋯+βpXp) \]

  • h0(t): baseline hazard

  • X: covariates (e.g., stage, age, tumor size, differentiation)

  • exp⁡(β): hazard ratio (HR)

Interpreting Hazard Ratios

  • HR > 1: increased risk of death

  • HR < 1: decreased risk

  • Represents multiplicative change in hazard for a one‑unit increase in a covariate

Hazard Ratio

The hazard ratio represents the multiplicative change in the hazard associated with a one‑unit increase in a covariate. Values greater than 1 indicate increased risk, while values less than 1 indicate reduced risk.

\[HR=eβ\]

Predicted Survival Function

\[ \hat{S}(t \mid X) = \left[\hat{S}_0(t)\right]^{\exp(\beta X)} \]

  • Adjusts baseline survival curve according to patient‑specific covariates

  • Allows individualized survival predictions

Modeling

  • Objective: Evaluate how tumor stage at diagnosis influences survival time in breast cancer patients

  • Hypothesis: Higher AJCC stage → worse survival outcomes

  • Dataset included:

    • Demographics: age, race

    • Clinical variables: tumor size, differentiation, ER/PR status

    • Nodal information: regional nodes examined & positive

    • Survival data: survival months, event status

Modeling Continued

  • Analytical Strategy:

    • Log‑rank tests to compare survival distributions across tumor stages

    • Cox proportional hazards model to quantify the association between tumor stage and hazard of death

    • Adjusted for relevant covariates to reduce confounding

  • Software: All analyses conducted in R

  • Model diagnostics: Checked proportional hazards assumption and overall model validity

Assumptions of the Cox Model

Key Assumptions

  • Proportional hazards: HRs remain constant over time

  • Accurate event measurement: Correct survival times and censoring indicators Independent censoring: Censoring unrelated to event risk

  • Correct model specification: Relevant covariates included; functional forms appropriate

Limitations of the Cox Model

Limitations

  • Assumption violations: Non‑proportional hazards can bias results

  • Informative censoring: If censoring relates to prognosis → biased estimates

  • Competing risks: Standard Cox may overestimate event probabilities

  • Small sample sizes: Rare events reduce model stability

  • Time‑dependent bias: Misclassified exposure time → immortal time bias

Dataset Overview

  • Real‑world breast cancer dataset with demographic, clinical, and tumor‑specific variables

Purpose: Provide the necessary inputs to evaluate how tumor stage influences survival while adjusting for key patient and tumor characteristics.

Research question: How does tumor stage at diagnosis influence survival time in breast cancer patients?

Hypothesis: Higher tumor stage is associated with significantly worse survival.

Data Overview Continued

  • Outcome variables:

    • Survival time (months)

    • Event status (death vs. censored)

  • Primary predictor:

    • AJCC 6th edition tumor stage
  • Covariates:

    • Age

    • Race

    • Tumor size

    • Tumor differentiation

    • Estrogen

Analytical Workflow

  1. Load and inspect dataset for completeness and consistency

  2. Prepare variables (e.g., convert categorical predictors to factors)

  3. Conduct exploratory data analysis to summarize demographics and tumor characteristics

  4. Generate Kaplan–Meier curves stratified by AJCC stage

  5. Perform log‑rank tests to compare survival distributions

  6. Fit Cox proportional hazards model adjusting for covariates

  7. Evaluate proportional hazards assumption using Schoenfeld residuals

  8. Visualize results (KM curves, hazard ratio forest plots)

Purpose: Ensure a rigorous, transparent process for modeling survival outcomes and interpreting the impact of tumor stage.

Kaplan–Meier Survival Curves by Stage

Kaplan–Meier Survival Curves by Estrogen

Cox Proportional Hazards Model

Explanation: Forest plot showing significant and non‑significant predictors of breast cancer survival.

Interpretation of Findings

  • The Cox model did not show AJCC tumor stage as a significant independent predictor of survival after adjustment

  • This contrasts with the original hypothesis and suggests that stage’s prognostic effect may be explained by other tumor characteristics

  • Tumor differentiation was significantly associated with survival, with poorer differentiation linked to higher hazard of death

Interpretation of Findings Continued

  • Estrogen receptor status also emerged as a significant predictor, highlighting the importance of tumor biology

  • Age, race, and tumor size were not significant in the adjusted model

  • The proportional hazards assumption was satisfied, indicating stable hazard ratios over time

Clinical Implications

  • Results emphasize the importance of comprehensive tumor profiling, not stage alone, in predicting survival

  • Significant effects of differentiation and hormone receptor status highlight the value of biological markers in treatment planning

  • Findings suggest that traditional staging may not fully capture prognosis when biological features are considered

  • Supports individualized care strategies that integrate pathology, receptor status, and tumor behavior

  • Demonstrates how survival analysis can reveal nuanced relationships in real‑world clinical data sets

Summary of Key Findings

  • AJCC tumor stage was not a significant independent predictor of survival in the adjusted model

  • The original hypothesis was not supported by the multivariable analysis

  • Tumor differentiation and estrogen receptor status were the only significant predictors of mortality

  • The Cox model provided a clear framework for evaluating multiple clinical and pathological factors simultaneously

  • Results highlight the multi-factorial nature of breast cancer prognosis

Final Takeaways

  • Prognosis in breast cancer is shaped by biological characteristics, not just anatomical stage

  • Hormone receptor status and tumor differentiation play key roles in survival and treatment decisions

  • Comprehensive diagnostic evaluation remains essential for accurate risk stratification

  • Quantifying risk across multiple tumor features supports personalized follow‑up and management

  • The Cox model remains a powerful tool for analyzing survival in complex clinical data sets