Prediction Explanation Clustering#

../_images/explanationcluster.png

Run clustering on prediction explanations generated from DataRobot.

Motivation#

Explanation clustering assigns predictions with similar explanations to clusters. This is helpful for understanding model behavior as well as identifying groups in a dataset that may have shared behavior. Once groups are identified, it’s possible to identify features that characterize groups of predictions.

For example, a model may predict that a customer will churn. The model may predict that a customer will churn because they have a low credit score, low tenure, and low number of products. The clustering algorithm will group together customers that have similar explanations. This can help identify groups of customers that are similar in their behavior which can inform a higher level intervention strategy.

drx makes it easy to run clustering on prediction explanations generated from DataRobot. Each of the four steps below can be done in one line of code.

Usage#

preds = my_deployment.predict(df, max_explanations='all')
featurized_explantions = drx.featurize_explanations(preds)
model = drx.AutoClusteringModel().fit(featurized_explantions)
preds['cluster'] = model.predict(featurized_explanations, wait_for_autopilot=True)

Steps#

  1. Get prediction explanations from a deployment.

  2. Featurize explanations so that each column represents a feature and each row represents an explanation strength value.

  3. Run an AutoClustering project.

  4. Get labels from AutoClustering project using the same data you trained on.

1. Get prediction explanations#

import pandas as pd
import datarobotx as drx

# Read in scoring data
df_score = pd.read_csv("https://s3.amazonaws.com/datarobot_public/drx/land_sales_scoring.csv")

# Get a deployment
deployment = drx.Deployment("62fce7d1f3fa3668b3f57305")

# Make predictions
preds = deployment.predict(df_score, max_explanations='all')
print(preds.iloc[:3, :5])

prediction

explanation_1_feature_name

explanation_1_actual_value

explanation_1_strength

explanation_2_feature_name

0

203222

BsmtExposure

No

-0.01004

BsmtFinType1

1

195837

BsmtExposure

Gd

0.0650181

BsmtFinType1

2

218750

BsmtExposure

Mn

0.00100108

BsmtFinType1

2. Featurize explanations#

featurized_explanations = drx.featurize_explanations(preds)

print(featurized_explanations.iloc[:5, :5].to_markdown())

1stFlrSF

2ndFlrSF

BsmtExposure

BsmtFinSF1

BsmtFinType1

0

-0.0312692

0.0462814

-0.01004

0.0255373

0.0098323

1

0.018165

-0.0325676

0.0650181

0.0332055

0.00746218

2

-0.02005

0.0472192

0.00100108

0.0116751

0.0098323

3

-0.0232044

0.0377383

-0.01004

-0.00337561

0.00746218

4

0.00634016

0.0395338

0.0064365

0.015613

0.0098323

3. Run an AutoClustering project#

model = drx.AutoClusteringModel(name="ClusterExplainModel").fit(featurized_explanations)

4. Get Labels from AutoClustering Project#

preds['cluster'] = model.predict(featurized_explanations, wait_for_autopilot=True)
print(preds[['prediction', 'cluster']].head())

prediction

cluster

0

203222

Cluster 1

1

195837

Cluster 1

2

218750

Cluster 1

3

166764

Cluster 3

4

298480

Cluster 2

API Reference#

Deployment([deployment_id])

DataRobot ML Ops deployment.

AutoClusteringModel([name, n_clusters])

Automated clustering orchestrator.

featurize_explanations(X)

Featurizes a dataframe of explanations into strength values by feature.