An analytical tool to identify and correct predictive and socioeconomic bias in machine learning models
In the 2023 Bias Detection Tools in Healthcare challenge, iAdeptive Technologies secured the second position, a competition organized by the National Institute of Health (NIH) National Center for Advancing Translational Sciences (NCATS). Our submission introduced tools for assessing and mitigating predictive and socioeconomic bias through a data-agnostic and model-agnostic methodology. This case study shows how iAdeptive AI/ML Enabled Software Developments can help identify and mitigate bias in artificial intelligence and machine learning models.
Scope
- AI Product Development
- AI Strategy Consulting
- Data management & Intelligence
Objective
The objective of the Bias Detection Tools in Healthcare challenge encompassed the development of a tool with the following key goals:
- Assess Predictive Bias: Identifying inaccuracies within an algorithm that result in estimates significantly diverging from the true underlying data.
- Assess Socioeconomic Bias: Detecting biases within model outcomes concerning specific socioeconomic groups or populations, often indicative of systemic healthcare disparities.
- Remediate Predictive and Socioeconomic Bias: After quantifying bias within a model, adjusting it to ensure equitable outcomes across all population subsets.
To illustrate the capabilities of our model bias tool, we employed synthetic data to create a classification model for predicting the risk of chronic kidney disease (CKD) progression from Stage 4 to Stage 5. Our tool effectively computed performance metrics specific to race, gender, and income, pinpointed areas of model bias, and alleviated these biases through iterative cutoff manipulation and optimization. While this case exemplifies the tool’s effectiveness, it possesses broad applicability across diverse healthcare scenarios and predictive classification models.
The —
Solution
Approach
Our Approach for this use case is as follows
01. Bias Detection
In our implementation of the disparity mitigation tool, we opted to incorporate 12 distinct metrics designed to quantify bias, some of which are accompanied by specific fairness criteria. These metrics encompass: True Positive Rate (linked to Equal Opportunity), True Negative Rate, Positive Predictive Value (associated with Predictive Parity), Negative Predictive Value, False Negative Rate, False Positive Rate (related to Predictive Equality), False Discovery Rate, False Omission Rate, Threat Score, Positive Rate (pertaining to Statistical Parity), Accuracy (corresponding to Overall Accuracy Equality) and F1 Score. Our bias detection tool calculates these metrics across any specified population within a model. Additionally, it offers the capability to analyze Receiver Operating Characteristic (ROC) curves at a group-specific level. With this diverse range of metrics, users can gain comprehensive insights into both socioeconomic and predictive biases present in models. These insights can be explored across the entire model, within specific demographic or sub-demographic groups, or in comparison to a privileged demographic group.
02. Bias Mitigation
The disparity mitigation tool rectifies bias within models to enhance their fairness. This is achieved by fine-tuning the positive label thresholds on a subgroup-specific basis through an iterative procedure. The goal is to minimize disparities in bias metrics across the chosen demographic subgroups. Notably, this method is entirely model-agnostic, as it derives new thresholds solely from predicted values, actual values, and demographic labels. Importantly, for pre-existing models, this tool obviates the need for model retraining, ensuring that the benefits of more equitable outputs can be readily realized.
HAIP Data Processing and Feature Engineering
The HAIP project employed publicly available data from HQR, which underwent a comprehensive preprocessing pipeline. This process involved data consolidation, aggregation, and standardization to ensure uniform formatting and consistency. The Measure-Specific Dataset was subjected to quality checks to confirm the absence of duplicate provider-quarter combinations and verify the integrity of data values within the expected format and range.
Feature Engineering
Feature engineering is a crucial step when working with datasets with relatively few features. In the HAIP project, feature engineering played a vital role in enhancing model performance. This involved the calculation of rolling mean scores for each provider to assess performance changes over time, as well as the introduction of lagged and differenced scores. Both supervised and unsupervised algorithms were employed to validate model predictions for accuracy.
Modeling Algorithms
XGBoost:
XGBoost is a predictive modeling algorithm that leverages gradient boosted trees. This approach combines multiple decision tree models to create a single, high-performance model.
Isolation Forest:
The Isolation Forest algorithm explicitly focuses on modeling anomalies. It constructs a series of tree-based models to calculate anomaly scores ranging from 0 to 1, with higher scores indicating a higher likelihood of an observation being an anomaly.
Ensemble Model:
The Ensemble model used an XGBoost regression model, incorporating anomaly scores from the Isolation Forest model as an additional modeling variable. This approach created a chained ensemble model where the results of the Isolation Forest contributed to the XGBoost Regression model. This strategy was chosen to expand the feature space within the modeling dataset and provide additional information for the model’s consideration.
Model Selection
The comparison of candidate models for the XGBoost regression and ensemble models involved a 5-fold cross-validated Root Mean Square Error (RMSE) calculation, averaged over 10 repetitions. In cases where multiple models exhibited similar RMSE values, the model with the most conservative parameters was chosen.
For the Isolation Forest model, candidate models were evaluated using the Area Under the Receiver Operating Characteristic Curve (AUROC). Similar to the XGBoost models, in situations where multiple models demonstrated comparable AUROC values, the model with the most conservative parameters was selected.
Subsequently, XGBoost regression and ensemble models with the final chosen parameters were trained and applied to the entire dataset. Anomaly predictions were determined based on the optimal thresholds, considering the absolute difference between the actual measure score and the predicted measure score. In the case of the Isolation Forest model, the optimal threshold was applied to the anomaly score to assign anomaly predictions.
Results
Early Identification of Kidney Disease Progression
As part of the challenge, we formulated a predictive model tailored for an illustrative use case, aiming to forecast the likelihood of CKD patients progressing from Stage 4 to Stage 5 CKD, denoting a severe decline in kidney function. CKD is a pervasive health concern in the United States, affecting approximately one in seven adults. Alarmingly, many individuals with moderately impaired kidney function remain unaware of their CKD status. Despite its relatively low prevalence in the Medicare FFS patient population, the economic burden of treating CKD and ESRD amounts to a substantial 7% of Medicare’s annual expenditure, totaling $120.6 billion.
Predictive models, like the one we developed, hold considerable promise for enhancing clinical practice, research endeavors, and public health policy. They facilitate the early identification of at-risk populations and support the estimation of CKD progression. However, the practicality of these models can be compromised if their outcomes and predictions exhibit biases, particularly when it comes to specific patient subgroups. Our bias detection and mitigation tool are engineered to be both data and model agnostic, enabling it to detect latent biases across diverse demographic variables, including gender, race, and income. Furthermore, it possesses the capability to rectify biased models, ensuring that their outputs are unbiased. These methodologies empower healthcare professionals to uncover and rectify concealed or inadvertent biases within models, which could otherwise result in discriminatory outcomes or erroneous diagnoses and prognoses.
The early identification and resolution of these issues are paramount to enhancing patient care over time. Debiased models not only enhance accuracy but also serve as a safeguard against the inadvertent perpetuation of healthcare disparities.
Benefits
This tool holds significant value for the healthcare sector by effectively detecting social and predictive biases across demographic subgroups. The following key benefits underscore its significance:
- Generalized Applicability: The tool's bias measurements and mitigation techniques are highly versatile, compatible with any classification model and demographic inputs within the dataset, as long as they are identified by the user.
- Comprehensive Metrics: The inclusion of twelve diverse metrics ensures that users have a rich set of precise analyses at their disposal for evaluating the fairness and equity of a model.
- Dual Bias Identification: The tool can pinpoint both social and predictive biases. For instance, the 'equal opportunity' metric gauges social fairness by assessing the model's accuracy in predicting positive outcomes for individuals with a specific positive condition, irrespective of their subgroup. The 'accuracy' metric evaluates predictive fairness by measuring the correctness of model-generated predictions across various subgroups.
- Enhanced Confidence: By recognizing and rectifying systematic bias in healthcare machine learning models, the tool instills greater confidence among practitioners and patients that these models deliver both accuracy and equitable application in health-related decisions.
- Transparency and Specificity: The tool provides in-depth insights into the precise modeling biases and the affected subgroups, enhancing transparency regarding model performance and accuracy.
Key Facts
Conclusion
AI/ML can mitigate and address biases using a data agnostic and model agnostic method.
Global health processes involve intricate variables, including disease patterns and human factors. The complexity of new models varies, particularly in clinical settings where models utilizing electronic health data often exhibit bias against underprivileged and minority populations. A comprehensive examination of bias is essential at all computational levels. In domains like finance, law, or transportation, bias can result from sampling or implicit factors. Race and ethnicity significantly influence disease prevalence in healthcare, necessitating the consideration of these disparities. A more effective approach involves developing distinct algorithms for different demographic groups, rather than relying on a universal one.
The concept of fairness is multifaceted and requires defining and selecting the appropriate fairness metric and optimization criteria. Inclusivity is crucial, involving all stakeholders to collaboratively determine the fair thresholds for the algorithm in question. The mechanisms through which machine learning can exacerbate and propagate predictive and social bias have expanded. Our study delves into how AI/ML can mitigate and rectify these biases using a data-agnostic, model-agnostic approach. Healthcare, as one domain, calls for profound reflection on structural inequities and innovative AI/ML applications, offering the potential to alleviate injustices faced by marginalized groups.