ColumnReduce#

../_images/fire.png

Apply Autopilot iteratively to a dataset to identify the (sub)set of columns that produces the most accurate trained model

Common usages for this strategy include:

  • Training more interpretable models

  • Reducing extremely wide datasets by leveraging the strategy within an outer loop

  • Improving model performance

Feature Importance Rank Ensembling (FIRE) is the presently implemented approach.

Delegates training on column reduced data to the provided base model. Blenders and frozen models are excluded from champion model consideration.

Usage#

Train#

import pandas as pd
import datarobotx as drx

df = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/madelon_combined_80.csv')

base_model = drx.AutoMLModel()
model = drx.ColumnReduceModel(base_model, 
                              ranking_ensemble_size=3,
                              initial_lives=2)
model.fit(df, target='y')

Run on existing project#

project_id = '123456789'
model = drx.ColumnReduceModel.run_column_reduction(project_id, 
                                                   ranking_ensemble_size=3,
                                                   initial_lives=2)

Retrieve features from the champion model#

features = model.features

Predict#

test_df = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/madelon_combined_20.csv')

predictions = model.predict(test_df)
class_probs = model.predict_proba(test_df)

Deploy#

deployment = model.deploy()

API Reference#

ColumnReduceModel(base_model[, ...])

Column reduction orchestrator.

Deployment([deployment_id])

DataRobot ML Ops deployment.