FeatureDiscoveryModel#
- class datarobotx.FeatureDiscoveryModel(base_model, remove_udfs=False)#
Feature discovery orchestrator.
Autopilot orchestration is delegated to the provided base model.
Builds features on secondary datasets before running an autopilot model. Primary and secondary datasets can be provided as pandas dataframes or AI catalog entries. Users can also provide a relationship configuration id built using the Python SDK.
- Parameters:
base_model (AutopilotModel or IntraProjectModel) – Base model for orchestrating Autopilot after feature discovery. Clustering and AutoTS are not supported.
remove_udfs (bool) – Whether feature discovery should forego deriving features using UDFs
Examples
>>> import datarobotx as drx >>> df_target = pd.read_csv( ... "https://s3.amazonaws.com/datarobot_public_datasets/drx/Lending+Club+Target.csv" ... ) >>> df_transactions = pd.read_csv( ... "https://s3.amazonaws.com/datarobot_public_datasets/drx/Lending+Club+Transactions.csv" ... ) >>> base_model = drx.AutoMLModel() >>> model = drx.FeatureDiscoveryModel(base_model) >>> transaction_relationship = drx.Relationship( ... df_transactions, ... keys="CustomerID", ... temporal_key="Date" ... feature_windows=[(-14, -7, "DAY"), (-7, 0, "DAY")], ... dataset_name="transactions" ...) >>> model.fit( ... df_target, ... target="BadLoan", ... feature_engineering_prediction_point="Date", ... relationships_configuration=[transaction_relationship] ... )
Inherited attributes:
Base model used for fitting.
DataRobot python client datarobot.Model object for the present champion.
DataRobot python client datarobot.Project object.
Methods:
fit
(X, relationships_configuration, *args[, ...])Fit a feature discovery model.
Retrieve feature discovery derived features.
Retrieve SQL recipes for producing derived features.
Inherited methods:
deploy
([wait_for_autopilot, name])Deploy the model into ML Ops.
Retrieve configuration parameters for the intra-project model.
predict
(X[, wait_for_autopilot])Make batch predictions using the present champion.
predict_proba
(X[, wait_for_autopilot])Calculate class probabilities using the present champion.
set_params
(**kwargs)Set configuration parameters for the intra-project model.
share
(emails)Share a project with other users.
- property base_model: ModelOperator#
Base model used for fitting.
- Returns:
Base model instance
- Return type:
AutopilotModel or IntraProjectModel
- deploy(wait_for_autopilot=False, name=None)#
Deploy the model into ML Ops.
- Return type:
- Returns:
Deployment – Resulting ML Ops deployment
wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before deploying the model In non-notebook environments, fit() will always block until complete
name (str, optional, default=None) – Name for the deployment. If None, a name will be generated
- property dr_model: Model#
DataRobot python client datarobot.Model object for the present champion.
- Returns:
datarobot.Model object associated with this drx model
- Return type:
datarobot.Model
- property dr_project: Project#
DataRobot python client datarobot.Project object.
- Returns:
datarobot.Project object associated with this drx.Model
- Return type:
datarobot.Project
- fit(X, relationships_configuration, *args, target=None, feature_engineering_prediction_point=None, **kwargs)#
Fit a feature discovery model.
Applies automatic feature engineering and feature selection to the dataset before running the base model. Note that AutoTS and Clustering base models are not supported for feature discovery.
- Parameters:
X (pandas.DataFrame or str) – Training dataset for challenger models
relationships_configuration (Union[str, Relationship, List[Relationship]]) – Secondary dataset(s) relationship configuration. For more complex relationships, users can instead pass the relationship configuration id of a relationship configured using the official DR python client
*args (
Any
) – Positional arguments to be passed to the base model fit()target (str, optional) – Column name from the dataset to be used as the target variable
feature_engineering_prediction_point (str, optional) – Column name of feature in target dataset to join based on time Must be set in order to derive time based features
**kwargs (
Any
) – Keyword arguments to be passed to the base model fit()
- Return type:
- get_derived_features()#
Retrieve feature discovery derived features.
- Returns:
df – DataFrame containing the derived features from the feature discovery process.
- Return type:
FutureDataFrame
- get_derived_sql()#
Retrieve SQL recipes for producing derived features.
- Returns:
String with the SQL code for generating the derived features from the feature discovery process. Use with print() for a more readable output
- Return type:
- get_params()#
Retrieve configuration parameters for the intra-project model.
- Returns:
config – Configuration object containing the parameters for intra project model
- Return type:
Notes
Access configuration parameters for the underlying base model by calling get_params() on the base_model attribute
- predict(X, wait_for_autopilot=False, **kwargs)#
Make batch predictions using the present champion.
Predictions are calculated asynchronously - returns immediately but reinitializes the returned DataFrame with data once predictions are completed.
Predictions are made within the project containing the model using modeling workers. For real-time predictions, first deploy the model.
- Parameters:
X (pandas.DataFrame or str) – Dataset to be scored - target column can be included or omitted. If str, can be AI catalog dataset id or name (if unambiguous)
wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before making predictions In non-notebook environments, fit() will always block until complete
**kwargs (Any) – Other key word arguments to pass to the _predict function
- Returns:
Resulting predictions (contained in the column ‘predictions’) Returned immediately, updated automatically when results are completed.
- Return type:
- predict_proba(X, wait_for_autopilot=False, **kwargs)#
Calculate class probabilities using the present champion.
Only available for classifier and clustering models.
- Parameters:
X (pandas.DataFrame or str) – Dataset to be scored - target column can be included or omitted. If str, can be AI catalog dataset id or name (if unambiguous)
wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before making predictions In non-notebook environments, fit() will always block until complete
**kwargs (Any) – Other key word arguments to pass to the _predict function
- Returns:
Resulting predictions; probabilities for each label are contained in the column ‘class_{label}’; returned immediately, updated automatically when results are completed.
- Return type:
See also
- set_params(**kwargs)#
Set configuration parameters for the intra-project model.
- Parameters:
**kwargs (
Any
) – Configuration parameters to be set or updated for this model.- Returns:
self – IntraProjectModel instance
- Return type:
IntraProjectModel
Notes
Configuration parameters for the underlying base model can be set by calling set_params() on the base_model attribute