Abstract
Background: Coronary artery bypass grafting (CABG) and surgical aortic valve replacement (AVR) are the 2 most common cardiac surgery procedures in North America. We derived and externally validated clinical models to estimate the likelihood of death within 30 days of CABG, AVR or combined CABG + AVR.
Methods: We obtained data from the CorHealth Ontario Cardiac Registry and several linked population health administrative databases from Ontario, Canada. We derived multiple logistic regression models from all adult patients who underwent CABG, AVR or combined CABG + AVR from April 2017 to March 2019, and validated them in 2 temporally distinct cohorts (April 2015 to March 2017 and April 2019 to March 2020).
Results: The derivation cohorts included 13 435 patients who underwent CABG (30-d mortality 1.73%), 1970 patients who underwent AVR (30-d mortality 1.68%) and 1510 patients who underwent combined CABG + AVR (30-d mortality 3.05%). The final models for predicting 30-day mortality included 15 variables for patients undergoing CABG, 5 variables for patients undergoing AVR and 5 variables for patients undergoing combined CABG + AVR. Model discrimination was excellent for the CABG (c-statistic 0.888, optimism-corrected 0.866) AVR (c-statistic 0.850, optimism-corrected 0.762) and CABG + AVR (c-statistic 0.844, optimism-corrected 0.776) models, with similar results in the validation cohorts.
Interpretation: Our models, leveraging readily available, multidimensional data sources, computed accurate risk-adjusted 30-day mortality rates for CABG, AVR and combined CABG + AVR, with discrimination comparable to more complex American and European models. The ability to accurately predict perioperative mortality rates for these procedures will be valuable for quality improvement initiatives across institutions.
Coronary artery bypass grafting (CABG) and surgical aortic valve replacement (AVR) are 2 of the most common cardiac surgical procedures in North America.1 Accurate risk models of perioperative mortality for CABG and AVR are not only useful for operative decision-making,2 but also valuable for quality improvement initiatives across surgeons and institutions.
In North America, the most widely used 30-day mortality risk score is the Society of Thoracic Surgeons (STS)–Predicted Risk of Mortality tool, derived from more than 1000 hospitals in the United States and encompassing more than 50 variables.3 An ideal risk model should be built and validated on the patient population in which it will be applied. Although the STS–Predicted Risk of Mortality tool was derived from a large surgical population, regional differences in patient sociodemographics and health care delivery systems may preclude this model from performing optimally in the health system where cardiac surgery is publicly funded. Furthermore, collecting more than 50 variables is resource intensive and is not feasible for all institutions. Similar limitations apply to the EuroSCORE II, which was derived from a population-based cohort in Europe.4 Given these limitations, we developed a more parsimonious model using readily available, linked clinical and administrative data sets in Ontario, Canada, to efficiently and accurately calculate risk-adjusted 30-day mortality rates for the purpose of province-wide quality improvement after CABG, AVR and combined CABG + AVR.
Methods
Study design and population
We conducted a retrospective analysis of patients aged 18 years and older and eligible for Ontario’s public health insurance plan, who were identified in the CorHealth Ontario Cardiac Registry as having had CABG, AVR or combined CABG + AVR surgery between Apr. 1, 2015, and Mar. 31, 2020.5,6 CorHealth is a provincial organization with a mandate to collect health data from all patients undergoing cardiac procedures and to provide strategic leadership to improve cardiac, stroke and vascular care in Ontario. This mandatory registry contains demographic, clinical and perioperative information on all patients who undergo major cardiovascular procedures and related cardiac interventions in Ontario.7–9
Our derivation cohort consisted of patients who underwent cardiac surgery between Apr. 1, 2017, and Mar. 31, 2019. Two temporally distinct validation cohorts included patients who underwent procedures between Apr. 1, 2015, and Mar. 31, 2017, and between Apr. 1, 2019, and Mar. 31, 2020. For each patient, we considered only the first surgical procedure in a given fiscal year. We confirmed surgical procedures performed using Canadian Classification of Health Interventions procedure codes, through linkage to the Canadian Institute for Health Information Discharge Abstract Database, which contains demographic, diagnostic and procedural information from the discharge abstracts of all acute care hospital admissions in Ontario; and the Ontario Health Insurance Plan Physician Claims Database, which contains information from nearly all physician encounters, diagnostic tests and outpatient laboratory services performed in Ontario. We excluded surgical procedures not recorded in the Discharge Abstract Database or Ontario Health Insurance Plan database, or for which CABG, AVR or combined CABG + AVR was performed concurrently with other cardiac procedures.
Outcome
Our primary outcome was 30-day all-cause mortality, captured from Ontario’s Registered Persons Database. This is a registry maintained by the Ontario Ministry of Health, containing demographic information about every individual who has ever been registered for the Ontario Health Insurance Plan, including their eligibility and dates of death. Registration is required to access publicly funded health care services in the province.
Candidate variable selection
We identified potential variables to be included in our mortality model from a review of predictors in previously published models or those deemed clinically important by our co-author group.4,10–13 In addition to key demographic variables (age, sex and ethnicity), we developed a list of 63 variables and forwarded it to members of the CorHealth Ontario Cardiac Surgery Risk Adjustment Task Group for further selection through a modified Delphi process.14,15 The task group comprises clinical-, administrative- and system-level leadership, with representatives from cardiac surgery centres across the province. It serves to advise CorHealth Ontario on risk-adjustment models for key quality indicators and clinical variables to be used in the monitoring and reporting of quality of care and outcomes of cardiac surgery. We first asked respondents to rate each of the variables as important or not in the risk stratification process (Appendix 1, Supplemental Table 1, available at www.cmaj.ca/lookup/doi/10.1503/cmaj.202901/tab-related-content). If an organization had more than 1 representative in the task group, we asked that 1 electronic survey be returned on behalf of all its members. Respondents were also able to suggest variables not already on the list. We then reviewed a summary of results from responses received from 7 of 15 organizations (47% response rate) in a subsequent task group teleconference, where a final list of 57 candidate variables was created through consensus-based discussion. Further refinement to combine similar variables — for example, previous stroke with previous transient ischemic attack — resulted in 49 candidate variables for model development (48 for the CABG model, owing to exclusion of endocarditis) (Appendix 1, Supplemental Table 2).
Data sources
Data sources for candidate variables are provided in Appendix 1, Supplemental Table 2. We used the CorHealth Registry, the Discharge Abstract Database, the National Ambulatory Care Reporting System and the Ontario Health Insurance Plan database to obtain baseline demographics and comorbidities in addition to identifying our study population.16,17 Other data sources included the Ontario Laboratories Information System for laboratory information; the Canadian Institute for Health Information Same-day Surgery database for day procedure history; the Ontario Cancer Registry for cancer and radiation treatment history; and the-Ontario Visible Minority Database for ethnicity.18 These data sets were linked using unique, encoded identifiers and analyzed at ICES (formerly Institute for Clinical Evaluative Sciences). Administrative codes and definitions used for variables, validation study results (where available) and variable formats are provided in Appendix 1, Supplemental Table 2.
Statistical analysis
In each model, we first performed unadjusted logistic regression to select potential predictors of 30-day mortality for each procedure of interest separately. We then entered candidate variables into a multivariable logistic regression model with backward selection and a significance threshold of < 0.05.19 Where missing, we imputed values using the procedure and sex-specific cohort mean (Appendix 1, Supplemental Table 2). We reviewed resulting models for face and content validity and selected final covariates based on statistical and clinical importance, as determined by the task group. For continuous variables, we examined their association with 30-day mortality using cubic spline analyses with 5 knots at the 5th, 27.5th, 50th, 72.5th and 95th percentiles. We entered linear variables (age, body surface area, hematocrit, leukocytes) into the models as continuous values, but treated nonlinear variables (Hospital Frailty Risk Score, body mass index, platelets) categorically based on their distribution in tertiles and clinically meaningful ranges.20,21 We report odds ratios, 95% confidence intervals (CIs) and p values for final covariates in each model.
In both derivation and validation cohorts, we evaluated model discrimination using the c-statistic. For internal validation in the derivation sample, we computed optimism-corrected c-statistics using 250 bootstrap samples. We assessed calibration using the Hosmer–Lemeshow χ2 statistic, Brier score, calibration slope and a calibration curve, comparing observed versus expected mortality rates across deciles of expected risk. We also assessed the performance of the STS model in our derivation and validation cohorts. Roughly half of the hospitals were collecting STS data at the time, and these data were available to us. For all other hospitals, we mapped as many of the STS variables as possible to existing data sources at ICES. This was then used to estimate risk based on STS, ultimately including the entire cohort in the STS calculation. We conducted all analyses using SAS version 9.4 (SAS Institute, Cary NC).
Ethics approval
The use of these data was authorized under section 45 of Ontario’s Personal Health Information Protection Act, which does not require review by a Research Ethics Board.
Results
The derivation cohorts included 13 435 patients who underwent CABG, 1970 patients who underwent AVR, and 1510 patients who underwent combined CABG + AVR (Figure 1). The sample size, number of deaths and proportion of patients who died in the derivation and validation cohorts are shown in Table 1. The baseline characteristics were similar between the derivation and validation cohorts across all groups (Appendix 1, Supplemental Tables 3–5).
Predictors of 30-day mortality after isolated CABG
History of percutaneous coronary intervention and left ventricular ejection fraction were forced into the model on the basis of clinical significance. Of the candidate covariates evaluated, older age, female sex, Hospital Frailty Risk Score,20 renal insufficiency, thrombocytopenia, atrial arrhythmia, chronic lung disease, peripheral arterial disease, cerebrovascular disease, previous CABG, percutaneous coronary intervention within 1 day before surgical revascularization, thoracic aortic disease, preoperative cardiogenic shock and moribund status22 were predictors of 30-day CABG mortality (Table 2).
The c-statistic was 0.888 in the derivation data set (optimism-corrected 0.866), indicating excellent discrimination,23 and the Hosmer–Lemeshow χ2 statistic p value was 0.2, indicating that there was no lack of model fit. These metrics of performance remained robust in both validation cohorts (Table 3). Supplemental Figure 1a in Appendix 1 shows the calibration plot of observed versus expected rates of 30-day CABG mortality according to each decile of risk, and Table 3 shows the predicted probability for the derivation and validation samples. The observed and predicted numbers of deaths were similar across all except the highest risk decile, in which the model tended to overestimate mortality.
In comparison, when fitted to the derivation cohort, the c-statistic of the STS model was 0.816, and the Hosmer–Lemeshow p value was 0.6. The c-statistic of the STS model was 0.841 and 0.863 in each of the validation cohorts.
Predictors of 30-day mortality after isolated AVR
Sex was forced into the model on the basis of clinical significance. The multivariable predictors of 30-day mortality were frailty, leukocytosis, liver disease and preoperative cardiogenic shock (Table 4).
The c-statistic was 0.850 in the derivation data set (optimism-corrected 0.762) and the Hosmer–Lemeshow χ2 statistic p value was 0.08. These metrics of performance remained robust in both validation cohorts (Table 3). Supplemental Figure 1b (Appendix 1) shows the calibration plot of observed versus expected rates of 30-day AVR mortality according to each decile of risk. The observed and predicted numbers of deaths were similar across all risk deciles.
In comparison, when fitted to the derivation cohort, the c-statistic of the STS model was 0.861, and the Hosmer–Lemeshow p value was 0.3. The c-statistic of the STS model was 0.846 in the first validation cohort. There was an insufficient number of events to validate the STS model in the second validation cohort.
Predictors of 30-day mortality after combined CABG + AVR
Sex and a history of previous CABG were forced into this model on the basis of clinical significance. Other multivariable predictors of 30-day mortality were frailty, anemia, a history of previous CABG and preoperative cardiogenic shock (Table 5).
The c-statistic was 0.84 in the derivation data set (optimism-corrected 0.764) and the Hosmer–Lemeshow χ2 statistic p value was 0.7. These metrics of performance remained robust in both validation cohorts (Table 3). Supplemental Figure 1c in Appendix 1 shows the calibration plot of observed versus expected rates of 30-day combined CABG + AVR mortality according to each decile of risk. The observed and predicted numbers of deaths were similar across all except the middle risk decile, in which the model tended to overestimate mortality.
In comparison, when fitted to the derivation cohort, the c-statistic of the STS model was 0.828, and the Hosmer–Lemeshow p value was 0.3. The c-statistic of the STS model was 0.881 in the first validation cohort. There was an insufficient number of events to validate the STS model in the second validation cohort.
Interpretation
We found that multidimensional data sources consisting of readily available clinical registry and administrative health databases can be used to develop 30-day mortality risk models for CABG, AVR and combined CABG + AVR, with excellent performance. We found that the Ontario CABG model was the best-performing model with a c-statistic of 0.888, while those for AVR and combined CABG + AVR also predicted well, with c-statistics of 0.850 and 0.844, respectively, and performed consistently in the validation cohorts. By comparison, the STS CABG model did not perform as well in Ontario, with a c-statistic of 0.816, while its performance for AVR (c-statistic 0.861) and CABG + AVR (c-statistic 0.828) was comparable to the Ontario models in the derivation data set.12
Several aspects of our models are novel, compared with existing perioperative mortality models. First, the incorporation of frailty in our models represents a major advance in the field. Indeed, cardiac surgery literature7,24,25 cites the exclusion of frailty as a major limitation of commonly used cardiac surgery risk scores. Second, the Ontario models achieved parsimony without sacrificing performance, which allows for efficient assessment of the quality of surgical care. Our CABG model included only 15 predictors as compared with more than 50 in the STS model, while our CABG + AVR and AVR models each included only 5 predictor variables. The large number of variables needed to risk-adjust using the STS model is a limitation, and only half of Ontario surgical hospitals participate in the STS data collection. Third, we were able to derive these models using routinely collected data that are readily available across all cardiac care institutions, without loss to follow-up. In contrast, collecting the data elements necessary for the STS model would require additional infrastructure, resources and personnel time to be put in place across all cardiac centres. Lastly, our models were developed by an interdisciplinary team with complementary expertise in cardiac surgery, cardiac anesthesiology, cardiology and clinical administration, for the purpose of quality assessment across centres. This differs from the other risk scores, which were derived primarily for pre-operative risk assessment and operative decision-making.
Our modelling methodology has additional unique strength. In contrast with other commonly used universal mortality prediction models, such as the EuroSCORE II and the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) score, which were based on pooled sets of diverse surgical procedures (whereby each procedure type is treated as a model covariate), our prediction models were procedure specific.26 Universal models may be particularly useful when event rates are low. The ACS NSQIP model includes more than 100 procedures, encompassing a wide variety of surgical specialties. By combining procedures into sets, more covariates could be added to potentially improve model performance. 27 However, this approach also makes the limiting assumption that the effect of each predictor is the same across procedures, which may not reflect reality. In addition, the EuroSCORE II allows for calculation of the risk for almost any combination of cardiac surgical procedures, as each procedure is treated as a model covariate.28 Although this method of modelling may produce a c-statistic that is acceptable overall, its predictive performance for individual procedures is poor, especially for procedures that differ technically.29 The fact that guidelines recommend using the STS–Predicted Risk of Mortality score (Class I) over the EuroSCORE II (Class IIb) for the prediction of 30-day mortality after CABG reflects the potential importance of procedure-specific models.30
Risk prediction models serve several purposes. In addition to informing treatment decision-making, they can be used for risk adjustment to allow evaluation of reporting on quality-of-care outcomes.31,32 Risk-adjusted rates for surgical procedures can be expressed as a ratio of the observed versus the expected number of deaths after patient characteristics have been adjusted for. These risk-adjusted measures enhance comparability of outcomes within and among institutions, and can be used to assess quality of surgical care. Reports of risk-adjusted mortality rates are now part of the standard repertoire to help facilitate high-quality surgical care and quality improvement initiatives.
Our research was motivated by a province-wide initiative to improve cardiac surgery quality that includes the provision of outcomes reports on key quality indicators for all cardiac centres in Ontario. Although these reports are not released to the public, each cardiac surgery program sees the outcomes of all other surgical centres in an identifiable manner. It should be noted that the practice of public reporting is controversial, as the observed outcomes are influenced by practice variations in patient selection, as well as the fact that even a small excess of adverse events could have a large impact on rates of rare outcomes. 33 Interestingly, a population-based cluster randomized trial by Tu and colleagues showed that the public release of hospital-specific quality indicators did not improve outcomes after acute myocardial infarction and congestive heart failure. 34 Conversely, in the setting of the Michigan Society of Thoracic and Cardiovascular Surgeons Quality Collaborative, where 33 hospitals participated in quarterly presentation of unblinded data for the purpose of quality improvement through enhanced feedback, a substantial reduction in the rate of postoperative pneumonia was shown after intervention. 35 Further studies are needed to determine whether an enhanced feedback system could reduce operative mortality after cardiac surgery.
Limitations
Our study has several limitations. First, important socio-demographic risk factors such as low socioeconomic status are difficult to capture using administrative data sets. Although we included neighbourhood income as a surrogate measure for socioeconomic status, we did not adjust for other determinants of socioeconomic status in this analysis. Second, certain physiologic details, such as specific lesion locations and exact percentage stenoses of coronary lesions, were unavailable in the data sets used. There is evidence that the inclusion of coronary anatomic complexity may improve mortality risk prediction.36 Third, we relied on administrative data and physician billing codes to derive covariates of interest, but the data sources used in this study and associated codes have been previously validated or published.18,37,38 Fourth, our models apply to the 3 most commonly performed cardiac surgery procedures, and the incremental risk of concomitant procedures — such as aortic root enlargement ascending aorta replacement — was not captured. Fifth, the low event rates for AVR and combined CABG + AVR precluded us from entering a large number of covariates during the modelling process. Despite this, our models performed well in 2 separate validation cohorts. Sixth, our study is limited by a lack of validation outside Ontario. Future opportunities to evaluate the ability of these models to benchmark national cardiac surgery performance are warranted, using data sources such as the Canadian Institute for Health Information. Lastly, continuous model updates are also warranted, to accommodate evolving patient demographics and indications for CABG and AVR.39
Conclusion
Accurate computation of 30-day mortality risk for CABG, AVR and combined CABG + AVR can be achieved parsimoniously using routinely collected multidimensional administrative and clinical registry data sets, with comparable performance to more complex models derived from large, clinical data-derived US and European registries. The parsimonious Ontario cardiac surgery risk scores are a product of province-wide interdisciplinary collaboration among cardiac surgeons, cardiac anesthesiologists, cardiologists and clinical administrators. Hybridization (using a hybrid of clinical registry and administrative data sources) of routinely collected multidimensional data sources represents an efficient approach to data collection that has utility in system-wide quality of care evaluation and reporting.
Footnotes
Competing interests: Louise Sun received support from the Canadian Institutes of Health Research (CIHR) for article processing charges. Dr. Sun was named National New Investigator by the Heart and Stroke Foundation of Canada, and is supported by a Clinical Research Chair in Big Data and Cardiovascular Outcomes at the University of Ottawa. Douglas Lee is the Ted Rogers Chair in Heart Function Outcomes, University Health Network, University of Toronto. Dr. Lee also received a research grant from CorHealth Ontario and a foundation grant from the Canadian Institutes of Health Research (CIHR). Peter Austin is supported by a Mid-Career Investigator Award from the Heart and Stroke Foundation. Dr. Austin also reports receiving a CIHR Project Grant, paid to Sunnybrook Research Institute. No other competing interests were declared.
This article has been peer reviewed.
Contributors: Louise Sun, Anna Chu, Derrick Tam, Jiming Fang, Peter Austin, Garth Oakes and Douglas Lee contributed to the conception and design of the work. Louise Sun, Anna Chu, Derrick Tam, Xuesong Wang, Jiming Fang, Peter Austin, Natasa Tusevljak, and Douglas Lee contributed to the acquisition and analysis of the data. All of the authors interpreted the data. Louise Sun, Anna Chu and Derrick Tam drafted the manuscript. All of the authors revised the manuscript critically for important intellectual content, gave final approval of the version to be published and agreed to be accountable for all aspects of the work.
Funding: This study was funded by CorHealth Ontario as a part of a province-wide quality initiative, and by a Foundation grant from the Canadian Institutes of Health Research grant no. FDN 148446. It is also supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health (MOH). The authors acknowledge that the clinical registry data used in this analysis is from participating hospitals through CorHealth Ontario, which serves as an advisory body to the MOH, is funded by the MOH, and is dedicated to improving the quality, efficiency, access and equity in the delivery of the continuum of adult cardiac, vascular and stroke care in Ontario, Canada.
Data sharing: The data set from this study is held securely in coded form at ICES. While legal data sharing agreements between ICES and data providers (e.g., health care organizations and government) prohibit ICES from making the data set publicly available, the full data set creation plan and underlying analytic code are available from the authors upon request, understanding that the computer programs may rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or may require modification.
Disclaimer: Parts of this material are based on data and/or information compiled and provided by the Canadian Institute for Health Information (CIHI) and Cancer Care Ontario (CCO). The analyses, conclusions, opinions and statements expressed in the manuscript are those of the authors, and do not necessarily reflect those of the above agencies. No endorsement by CIHI or CCO is intended or should be inferred.
- Accepted September 23, 2021.
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY-NC-ND 4.0) licence, which permits use, distribution and reproduction in any medium, provided that the original publication is properly cited, the use is non-commercial (i.e., research or educational use), and no modifications or adaptations are made. See: https://creativecommons.org/licenses/by-nc-nd/4.0/