Time Series Clustering and Segmented Modeling#

../_images/clustersegmentmodel.png

Run a segmented modeling project where the segments are determined using time series clustering.

Motivation#

In time series problems with many series, it can be useful to segment the data into groups of series that exhibit similar behavior. This can be done using time series clustering in DataRobot. Once each series is assigned a cluster, a separate model can be built for each cluster. Clustering and segmenting can be useful for improving overall model performance and understanding model behavior at a more granular level.

Segmented modeling takes patience

DataRobot requires Autopilot to run until completion before allowing users to make predictions or deploy the segmented model. This means that the predict and deploy methods will raise an error unless the project has finished running.

Steps#

  1. Train a time series clustering model on multiseries dataset.

  2. Assign clustering labels to each series in multiseries dataset.

  3. Run a segmented modeling project using the clustering labels as the segmentation column.

1. Train a time series clustering model#

import pandas as pd
import datarobotx as drx


df = pd.read_csv("https://s3.amazonaws.com/datarobot_public_datasets/TimeSeriesClustering/acme_234series.csv")
cluster_data = df[['Date', 'seriesID', 'SalesQty']]

ts_cluster_model = drx.AutoTSModel()

ts_cluster_model.set_params(
    unsupervised_type='clustering',
    disable_holdout=True,
    autopilot_cluster_list=[3,4,5]
)

ts_cluster_model.fit(
    cluster_data,
    datetime_partition_column='Date',
    multiseries_id_columns='seriesID'
)

2. Assign clustering labels#

predicted_labels = ts_cluster_model.predict(cluster_data)
cluster_mapping = dict(zip(predicted_labels.seriesId, predicted_labels.prediction))
df['Cluster'] = df['seriesID'].map(cluster_mapping)

3. Run a segmented modeling project#

ts_segmented_model = drx.AutoTSModel()

ts_segmented_model.fit(
    df,
    target="SalesQty",
    datetime_partition_column="Date",
    multiseries_id_columns="seriesID",
    user_defined_segment_id_columns="Cluster",
)

API Reference#

AutoTSModel([name, feature_window, ...])

AutoTS orchestrator.

DRConfig([Data, Target, Featurization, ...])

DataRobot configuration.