SelfDiscoveryModel#
- class datarobotx.SelfDiscoveryModel(base_model, feature_windows=None)#
Self-join feature discovery orchestrator.
Partitions a single training dataset into two datasets that will be joined by DR feature discovery using the provided join keys. This allows feature discovery to synthetically create and explore feature aggregations and transformations on a single training dataset.
For OTV problems, the primary dataset includes the target variable, the user provided join keys and the date feature. For non-OTV problems, all original features are also included in the primary dataset.
The secondary dataset includes the join keys, the date feature (if applicable) and all non-target features. The secondary dataset will be automatically created as a new AI catalog entry.
Autopilot orchestration is delegated to the provided base model.
- Parameters:
base_model (AutopilotModel or IntraProjectModel) – Base model for orchestrating Autopilot after feature discovery. Clustering and AutoTS are not supported.
feature_windows (tuple or list of tuple, optional) –
Only applicable for OTV problems. A tuple with the following three elements to govern feature discovery: (start, end, unit) start: int how far back to look to aggregate features end: int stopping point for aggregation. unit: str unit of time to use for aggregation e.g. (‘MILLISECOND’, ‘SECOND’, ‘MINUTE’, ‘HOUR’, ‘DAY’, ‘WEEK’,’MONTH’,’QUARTER’, ‘YEAR’)
Example: (-14, -7, ‘DAY’) will created aggregated features from 14 days ago until 7 days ago
Can also be provided as a list of multiple feature discovery window tuples.
Inherited attributes:
Base model used for fitting.
DataRobot python client datarobot.Model object for the present champion.
DataRobot python client datarobot.Project object.
Methods:
deploy
([wait_for_autopilot, name])Deploy the model into ML Ops.
fit
(X, *args, keys[, kia_features, ...])Fit self-join feature discovery model.
Retrieve feature discovery derived features.
Retrieve SQL recipes for producing derived features.
predict
(X, *args, **kwargs)Predict using the base model.
predict_proba
(X, *args, **kwargs)Predict using the base model.
Inherited methods:
Retrieve configuration parameters for the intra-project model.
set_params
(**kwargs)Set configuration parameters for the intra-project model.
share
(emails)Share a project with other users.
- property base_model: ModelOperator#
Base model used for fitting.
- Returns:
Base model instance
- Return type:
AutopilotModel or IntraProjectModel
- deploy(wait_for_autopilot=False, name=None)#
Deploy the model into ML Ops.
- Return type:
- Returns:
Deployment – Resulting ML Ops deployment
wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before deploying the model In non-notebook environments, fit() will always block until complete
name (str, optional, default=None) – Name for the deployment. If None, a name will be generated
- property dr_model: Model#
DataRobot python client datarobot.Model object for the present champion.
- Returns:
datarobot.Model object associated with this drx model
- Return type:
datarobot.Model
- property dr_project: Project#
DataRobot python client datarobot.Project object.
- Returns:
datarobot.Project object associated with this drx.Model
- Return type:
datarobot.Project
- fit(X, *args, keys, kia_features=None, datetime_partition_column=None, **kwargs)#
Fit self-join feature discovery model.
AutoTS and Clustering base models are not supported for feature discovery.
- Parameters:
X (pandas.DataFrame) – Training dataset for challenger models
*args (
Any
) – Positional arguments to be passed to the base model fit()keys (str or list[str]) – Column name(s) of the feature(s) to be used for the self-join. Can be a scalar string or a list of strings.
kia_features (list, optional) – A list of features that will be included in the primary dataset of the feature discovery model. These will be treated as primary features and excluded feature discovery engineering.
datetime_partition_column (str, optional) – Column name of the feature to be used as the temporal key for creating a lookback window for feature discovery.
**kwargs (
Any
) – Keyword arguments to be passed to the base model fit()
- Return type:
- get_derived_features()#
Retrieve feature discovery derived features.
- Returns:
df – DataFrame containing the derived features from the feature discovery process.
- Return type:
FutureDataFrame
- get_derived_sql()#
Retrieve SQL recipes for producing derived features.
- Returns:
String with the SQL code for generating the derived features from the feature discovery process. Use with print() for a more readable output
- Return type:
- get_params()#
Retrieve configuration parameters for the intra-project model.
- Returns:
config – Configuration object containing the parameters for intra project model
- Return type:
Notes
Access configuration parameters for the underlying base model by calling get_params() on the base_model attribute
- set_params(**kwargs)#
Set configuration parameters for the intra-project model.
- Parameters:
**kwargs (
Any
) – Configuration parameters to be set or updated for this model.- Returns:
self – IntraProjectModel instance
- Return type:
IntraProjectModel
Notes
Configuration parameters for the underlying base model can be set by calling set_params() on the base_model attribute