Prediction Explanation Clustering#
Run clustering on prediction explanations generated from DataRobot.
Motivation#
Explanation clustering assigns predictions with similar explanations to clusters. This is helpful for understanding model behavior as well as identifying groups in a dataset that may have shared behavior. Once groups are identified, it’s possible to identify features that characterize groups of predictions.
For example, a model may predict that a customer will churn. The model may predict that a customer will churn because they have a low credit score, low tenure, and low number of products. The clustering algorithm will group together customers that have similar explanations. This can help identify groups of customers that are similar in their behavior which can inform a higher level intervention strategy.
drx
makes it easy to run clustering on prediction explanations generated from DataRobot. Each of the four steps below
can be done in one line of code.
Usage#
preds = my_deployment.predict(df, max_explanations='all')
featurized_explantions = drx.featurize_explanations(preds)
model = drx.AutoClusteringModel().fit(featurized_explantions)
preds['cluster'] = model.predict(featurized_explanations, wait_for_autopilot=True)
Steps#
Get prediction explanations from a deployment.
Featurize explanations so that each column represents a feature and each row represents an explanation strength value.
Run an AutoClustering project.
Get labels from AutoClustering project using the same data you trained on.
1. Get prediction explanations#
import pandas as pd
import datarobotx as drx
# Read in scoring data
df_score = pd.read_csv("https://s3.amazonaws.com/datarobot_public/drx/land_sales_scoring.csv")
# Get a deployment
deployment = drx.Deployment("62fce7d1f3fa3668b3f57305")
# Make predictions
preds = deployment.predict(df_score, max_explanations='all')
print(preds.iloc[:3, :5])
prediction |
explanation_1_feature_name |
explanation_1_actual_value |
explanation_1_strength |
explanation_2_feature_name |
|
---|---|---|---|---|---|
0 |
203222 |
BsmtExposure |
No |
-0.01004 |
BsmtFinType1 |
1 |
195837 |
BsmtExposure |
Gd |
0.0650181 |
BsmtFinType1 |
2 |
218750 |
BsmtExposure |
Mn |
0.00100108 |
BsmtFinType1 |
2. Featurize explanations#
featurized_explanations = drx.featurize_explanations(preds)
print(featurized_explanations.iloc[:5, :5].to_markdown())
1stFlrSF |
2ndFlrSF |
BsmtExposure |
BsmtFinSF1 |
BsmtFinType1 |
|
---|---|---|---|---|---|
0 |
-0.0312692 |
0.0462814 |
-0.01004 |
0.0255373 |
0.0098323 |
1 |
0.018165 |
-0.0325676 |
0.0650181 |
0.0332055 |
0.00746218 |
2 |
-0.02005 |
0.0472192 |
0.00100108 |
0.0116751 |
0.0098323 |
3 |
-0.0232044 |
0.0377383 |
-0.01004 |
-0.00337561 |
0.00746218 |
4 |
0.00634016 |
0.0395338 |
0.0064365 |
0.015613 |
0.0098323 |
3. Run an AutoClustering project#
model = drx.AutoClusteringModel(name="ClusterExplainModel").fit(featurized_explanations)
4. Get Labels from AutoClustering Project#
preds['cluster'] = model.predict(featurized_explanations, wait_for_autopilot=True)
print(preds[['prediction', 'cluster']].head())
prediction |
cluster |
|
---|---|---|
0 |
203222 |
Cluster 1 |
1 |
195837 |
Cluster 1 |
2 |
218750 |
Cluster 1 |
3 |
166764 |
Cluster 3 |
4 |
298480 |
Cluster 2 |
API Reference#
|
DataRobot ML Ops deployment. |
|
Automated clustering orchestrator. |
Featurizes a dataframe of explanations into strength values by feature. |