ColumnReduce#
Apply Autopilot iteratively to a dataset to identify the (sub)set of columns that produces the most accurate trained model
Common usages for this strategy include:
Training more interpretable models
Reducing extremely wide datasets by leveraging the strategy within an outer loop
Improving model performance
Feature Importance Rank Ensembling (FIRE) is the presently implemented approach.
Delegates training on column reduced data to the provided base model. Blenders and frozen models are excluded from champion model consideration.
Usage#
Train#
import pandas as pd
import datarobotx as drx
df = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/madelon_combined_80.csv')
base_model = drx.AutoMLModel()
model = drx.ColumnReduceModel(base_model,
ranking_ensemble_size=3,
initial_lives=2)
model.fit(df, target='y')
Run on existing project#
project_id = '123456789'
model = drx.ColumnReduceModel.run_column_reduction(project_id,
ranking_ensemble_size=3,
initial_lives=2)
Retrieve features from the champion model#
features = model.features
Predict#
test_df = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/madelon_combined_20.csv')
predictions = model.predict(test_df)
class_probs = model.predict_proba(test_df)
Deploy#
deployment = model.deploy()
API Reference#
|
Column reduction orchestrator. |
|
DataRobot ML Ops deployment. |