AutoClustering#

../_images/autocluster.png

Use DataRobot Autopilot to train diverse clustering models on a dataset.

Prediction and deployment methods execute on the clustering model with the highest silhouette score at the time of calling. Training is performed within a new, automatically created DataRobot project.

Usage#

Train#

import pandas as pd
import datarobotx as drx

train = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/credit-train-full_80.csv')
model = drx.AutoClusteringModel(n_clusters=[2, 3, 5])
model.fit(train)

Predict#

test = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/credit-train-full_20.csv')
predictions = model.predict(test)
cluster_probabilities = model.predict_proba(test)

Deploy#

deployment = model.deploy()

Time Series#

Time series clustering models are also supported. See the AutoTS docs for an example.

API Reference#

AutoClusteringModel([name, n_clusters])

Automated clustering orchestrator.

AutoTSModel([name, feature_window, ...])

AutoTS orchestrator.

DRConfig([Data, Target, Featurization, ...])

DataRobot configuration.

Deployment([deployment_id])

DataRobot ML Ops deployment.