Prediction Explanation Clustering#

Run clustering on prediction explanations generated from DataRobot.

Motivation#

Explanation clustering assigns predictions with similar explanations to clusters. This is helpful for understanding model behavior as well as identifying groups in a dataset that may have shared behavior. Once groups are identified, it’s possible to identify features that characterize groups of predictions.

For example, a model may predict that a customer will churn. The model may predict that a customer will churn because they have a low credit score, low tenure, and low number of products. The clustering algorithm will group together customers that have similar explanations. This can help identify groups of customers that are similar in their behavior which can inform a higher level intervention strategy.

drx makes it easy to run clustering on prediction explanations generated from DataRobot. Each of the four steps below can be done in one line of code.

Usage#

preds = my_deployment.predict(df, max_explanations='all')
featurized_explantions = drx.featurize_explanations(preds)
model = drx.AutoClusteringModel().fit(featurized_explantions)
preds['cluster'] = model.predict(featurized_explanations, wait_for_autopilot=True)

Steps#

Get prediction explanations from a deployment.
Featurize explanations so that each column represents a feature and each row represents an explanation strength value.
Run an AutoClustering project.
Get labels from AutoClustering project using the same data you trained on.

1. Get prediction explanations#

import pandas as pd
import datarobotx as drx

# Read in scoring data
df_score = pd.read_csv("https://s3.amazonaws.com/datarobot_public/drx/land_sales_scoring.csv")

# Get a deployment
deployment = drx.Deployment("62fce7d1f3fa3668b3f57305")

# Make predictions
preds = deployment.predict(df_score, max_explanations='all')
print(preds.iloc[:3, :5])

	prediction	explanation_1_feature_name	explanation_1_actual_value	explanation_1_strength	explanation_2_feature_name
0	203222	BsmtExposure	No	-0.01004	BsmtFinType1
1	195837	BsmtExposure	Gd	0.0650181	BsmtFinType1
2	218750	BsmtExposure	Mn	0.00100108	BsmtFinType1

2. Featurize explanations#

featurized_explanations = drx.featurize_explanations(preds)

print(featurized_explanations.iloc[:5, :5].to_markdown())

	1stFlrSF	2ndFlrSF	BsmtExposure	BsmtFinSF1	BsmtFinType1
0	-0.0312692	0.0462814	-0.01004	0.0255373	0.0098323
1	0.018165	-0.0325676	0.0650181	0.0332055	0.00746218
2	-0.02005	0.0472192	0.00100108	0.0116751	0.0098323
3	-0.0232044	0.0377383	-0.01004	-0.00337561	0.00746218
4	0.00634016	0.0395338	0.0064365	0.015613	0.0098323

3. Run an AutoClustering project#

model = drx.AutoClusteringModel(name="ClusterExplainModel").fit(featurized_explanations)

4. Get Labels from AutoClustering Project#

preds['cluster'] = model.predict(featurized_explanations, wait_for_autopilot=True)
print(preds[['prediction', 'cluster']].head())

	prediction	cluster
0	203222	Cluster 1
1	195837	Cluster 1
2	218750	Cluster 1
3	166764	Cluster 3
4	298480	Cluster 2

API Reference#

`Deployment`([deployment_id])	DataRobot ML Ops deployment.
`AutoClusteringModel`([name, n_clusters])	Automated clustering orchestrator.
`featurize_explanations`(X)	Featurizes a dataframe of explanations into strength values by feature.