SelfDiscoveryModel#

class datarobotx.SelfDiscoveryModel(base_model, feature_windows=None)#

Self-join feature discovery orchestrator.

Partitions a single training dataset into two datasets that will be joined by DR feature discovery using the provided join keys. This allows feature discovery to synthetically create and explore feature aggregations and transformations on a single training dataset.

For OTV problems, the primary dataset includes the target variable, the user provided join keys and the date feature. For non-OTV problems, all original features are also included in the primary dataset.

The secondary dataset includes the join keys, the date feature (if applicable) and all non-target features. The secondary dataset will be automatically created as a new AI catalog entry.

Autopilot orchestration is delegated to the provided base model.

Parameters:
  • base_model (AutopilotModel or IntraProjectModel) – Base model for orchestrating Autopilot after feature discovery. Clustering and AutoTS are not supported.

  • feature_windows (tuple or list of tuple, optional) –

    Only applicable for OTV problems. A tuple with the following three elements to govern feature discovery: (start, end, unit) start: int how far back to look to aggregate features end: int stopping point for aggregation. unit: str unit of time to use for aggregation e.g. (‘MILLISECOND’, ‘SECOND’, ‘MINUTE’, ‘HOUR’, ‘DAY’, ‘WEEK’,’MONTH’,’QUARTER’, ‘YEAR’)

    Example: (-14, -7, ‘DAY’) will created aggregated features from 14 days ago until 7 days ago

    Can also be provided as a list of multiple feature discovery window tuples.

Inherited attributes:

base_model

Base model used for fitting.

dr_model

DataRobot python client datarobot.Model object for the present champion.

dr_project

DataRobot python client datarobot.Project object.

Methods:

deploy([wait_for_autopilot, name])

Deploy the model into ML Ops.

fit(X, *args, keys[, kia_features, ...])

Fit self-join feature discovery model.

get_derived_features()

Retrieve feature discovery derived features.

get_derived_sql()

Retrieve SQL recipes for producing derived features.

predict(X, *args, **kwargs)

Predict using the base model.

predict_proba(X, *args, **kwargs)

Predict using the base model.

Inherited methods:

get_params()

Retrieve configuration parameters for the intra-project model.

set_params(**kwargs)

Set configuration parameters for the intra-project model.

share(emails)

Share a project with other users.

property base_model: ModelOperator#

Base model used for fitting.

Returns:

Base model instance

Return type:

AutopilotModel or IntraProjectModel

deploy(wait_for_autopilot=False, name=None)#

Deploy the model into ML Ops.

Return type:

Deployment

Returns:

  • Deployment – Resulting ML Ops deployment

  • wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before deploying the model In non-notebook environments, fit() will always block until complete

  • name (str, optional, default=None) – Name for the deployment. If None, a name will be generated

property dr_model: Model#

DataRobot python client datarobot.Model object for the present champion.

Returns:

datarobot.Model object associated with this drx model

Return type:

datarobot.Model

property dr_project: Project#

DataRobot python client datarobot.Project object.

Returns:

datarobot.Project object associated with this drx.Model

Return type:

datarobot.Project

fit(X, *args, keys, kia_features=None, datetime_partition_column=None, **kwargs)#

Fit self-join feature discovery model.

AutoTS and Clustering base models are not supported for feature discovery.

Parameters:
  • X (pandas.DataFrame) – Training dataset for challenger models

  • *args (Any) – Positional arguments to be passed to the base model fit()

  • keys (str or list[str]) – Column name(s) of the feature(s) to be used for the self-join. Can be a scalar string or a list of strings.

  • kia_features (list, optional) – A list of features that will be included in the primary dataset of the feature discovery model. These will be treated as primary features and excluded feature discovery engineering.

  • datetime_partition_column (str, optional) – Column name of the feature to be used as the temporal key for creating a lookback window for feature discovery.

  • **kwargs (Any) – Keyword arguments to be passed to the base model fit()

Return type:

SelfDiscoveryModel

get_derived_features()#

Retrieve feature discovery derived features.

Returns:

df – DataFrame containing the derived features from the feature discovery process.

Return type:

FutureDataFrame

get_derived_sql()#

Retrieve SQL recipes for producing derived features.

Returns:

String with the SQL code for generating the derived features from the feature discovery process. Use with print() for a more readable output

Return type:

str

get_params()#

Retrieve configuration parameters for the intra-project model.

Returns:

config – Configuration object containing the parameters for intra project model

Return type:

dict

Notes

Access configuration parameters for the underlying base model by calling get_params() on the base_model attribute

predict(X, *args, **kwargs)#

Predict using the base model.

Return type:

DataFrame

predict_proba(X, *args, **kwargs)#

Predict using the base model.

Return type:

DataFrame

set_params(**kwargs)#

Set configuration parameters for the intra-project model.

Parameters:

**kwargs (Any) – Configuration parameters to be set or updated for this model.

Returns:

self – IntraProjectModel instance

Return type:

IntraProjectModel

Notes

Configuration parameters for the underlying base model can be set by calling set_params() on the base_model attribute

share(emails)#

Share a project with other users. Sets the user role as an owner of the project.

Parameters:

emails (Union[str, list]) – A list of email addresses of users to share with

Return type:

None