ColumnReduceModel#

class datarobotx.ColumnReduceModel(base_model, ranking_ensemble_size=5, initial_retain_ratio=0.95, initial_lives=3)#

Column reduction orchestrator.

Iteratively trains challenger models on increasingly column-reduced training data until diminishing returns on model performance are reached. Uses Feature Importance Rank Ensembling (FIRE) for column reduction.

Delegates training on column reduced data to the provided base model. Blenders and frozen models are excluded from champion model consideration.

Parameters:
  • base_model (AutopilotModel or IntraProjectModel) – Base model to fit on column reduced training data

  • ranking_ensemble_size (int, default=5) – Number of top models from the leaderboard to include in the ensemble when computing the median feature importance rank for each feature

  • initial_retain_ratio (float, default=0.95) – Initial percent (expressed as a decimal) of cumulative feature importance to retain when performance column reduction

  • initial_lives (int, default=3) – Stopping criteria; number of reduction iterations to complete without establishing a new champion model

Attributes:

features

List of features used by the current best model.

Inherited attributes:

base_model

Base model used for fitting.

dr_model

DataRobot python client datarobot.Model object for the present champion.

dr_project

DataRobot python client datarobot.Project object.

Methods:

fit(*args, **kwargs)

Fit column-reduced challenger models using the underlying base model.

run_column_reduction(project_id[, ...])

Run feature reduction on the provided project iteratively.

Inherited methods:

deploy([wait_for_autopilot, name])

Deploy the model into ML Ops.

get_params()

Retrieve configuration parameters for the intra-project model.

predict(X[, wait_for_autopilot])

Make batch predictions using the present champion.

predict_proba(X[, wait_for_autopilot])

Calculate class probabilities using the present champion.

set_params(**kwargs)

Set configuration parameters for the intra-project model.

share(emails)

Share a project with other users.

property base_model: ModelOperator#

Base model used for fitting.

Returns:

Base model instance

Return type:

AutopilotModel or IntraProjectModel

deploy(wait_for_autopilot=False, name=None)#

Deploy the model into ML Ops.

Return type:

Deployment

Returns:

  • Deployment – Resulting ML Ops deployment

  • wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before deploying the model In non-notebook environments, fit() will always block until complete

  • name (str, optional, default=None) – Name for the deployment. If None, a name will be generated

property dr_model: datarobot.Model#

DataRobot python client datarobot.Model object for the present champion.

Returns:

datarobot.Model object associated with this drx model

Return type:

datarobot.Model

property dr_project: datarobot.Project#

DataRobot python client datarobot.Project object.

Returns:

datarobot.Project object associated with this drx.Model

Return type:

datarobot.Project

property features: List[str] | None#

List of features used by the current best model.

Returns:

Column names of features used in current best model

Return type:

list

fit(*args, **kwargs)#

Fit column-reduced challenger models using the underlying base model.

Parameters:
  • args – Arguments to be passed to the base model fit()

  • kwargs – Keyword arguments to be passed to the base model fit()

Return type:

ColumnReduceModel

get_params()#

Retrieve configuration parameters for the intra-project model.

Returns:

config – Configuration object containing the parameters for intra project model

Return type:

dict

Notes

Access configuration parameters for the underlying base model by calling get_params() on the base_model attribute

predict(X, wait_for_autopilot=False, **kwargs)#

Make batch predictions using the present champion.

Predictions are calculated asynchronously - returns immediately but reinitializes the returned DataFrame with data once predictions are completed.

Predictions are made within the project containing the model using modeling workers. For real-time predictions, first deploy the model.

Parameters:
  • X (pandas.DataFrame or str) – Dataset to be scored - target column can be included or omitted. If str, can be AI catalog dataset id or name (if unambiguous)

  • wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before making predictions In non-notebook environments, fit() will always block until complete

  • **kwargs (Any) – Other key word arguments to pass to the _predict function

Returns:

Resulting predictions (contained in the column ‘predictions’) Returned immediately, updated automatically when results are completed.

Return type:

pandas.DataFrame

predict_proba(X, wait_for_autopilot=False, **kwargs)#

Calculate class probabilities using the present champion.

Only available for classifier and clustering models.

Parameters:
  • X (pandas.DataFrame or str) – Dataset to be scored - target column can be included or omitted. If str, can be AI catalog dataset id or name (if unambiguous)

  • wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before making predictions In non-notebook environments, fit() will always block until complete

  • **kwargs (Any) – Other key word arguments to pass to the _predict function

Returns:

Resulting predictions; probabilities for each label are contained in the column ‘class_{label}’; returned immediately, updated automatically when results are completed.

Return type:

pandas.DataFrame

See also

predict

classmethod run_column_reduction(project_id, ranking_ensemble_size=5, initial_retain_ratio=0.95, initial_lives=3)#

Run feature reduction on the provided project iteratively.

Parameters:
  • project_id (str) – project id of an existing project to fit on column reduced training data

  • ranking_ensemble_size (int, default=5) – Number of top models from the leaderboard to include in the ensemble when computing the median feature importance rank for each feature

  • initial_retain_ratio (float, default=0.95) – Initial percent (expressed as a decimal) of cumulative feature importance to retain when performance column reduction

  • initial_lives (int, default=3) – Stopping criteria; number of reduction iterations to complete without establishing a new champion model

Returns:

Model object that can be used to make predictions and deploy models

Return type:

ColumnReduceModel

Examples

>>> from datarobotx.models.colreduce import ColumnReduceModel
>>> project_id = "123456"
>>> colreduce_model = ColumnReduceModel.run_column_reduction(project_id)
set_params(**kwargs)#

Set configuration parameters for the intra-project model.

Parameters:

**kwargs (Any) – Configuration parameters to be set or updated for this model.

Returns:

self – IntraProjectModel instance

Return type:

IntraProjectModel

Notes

Configuration parameters for the underlying base model can be set by calling set_params() on the base_model attribute

share(emails)#

Share a project with other users. Sets the user role as an owner of the project.

Parameters:

emails (Union[str, list]) – A list of email addresses of users to share with

Return type:

None