AutoClusteringModel#
- class datarobotx.AutoClusteringModel(name=None, n_clusters=None, **kwargs)#
Automated clustering orchestrator.
Trains clustering models asynchronously and exposes the model with the present highest silhouette score for predictions or deployment. Training is performed within an automatically created DataRobot project.
- Parameters:
name (str, optional) – Name to use for the DataRobot project that will be created. Alias for the DR ‘project_name’ configuration parameter.
n_clusters (int or list of int, optional) –
Number of clusters to form. Specify as a single integer between 2 and 100 or a list of up to 10 values.
Alias for the DR ‘autopilot_cluster_list’ configuration parameter.
**kwargs – Additional DataRobot configuration parameters for project creation and autopilot execution. See the DRConfig docs for usage examples.
See also
DRConfig
Configuration object for DataRobot project and autopilot settings, also includes detailed examples of usage
Inherited attributes:
DataRobot python client datarobot.Model object for the present champion.
DataRobot python client datarobot.Project object.
Methods:
fit
(X, **kwargs)Fit clustering models using DataRobot.
Inherited methods:
deploy
([wait_for_autopilot, name])Deploy the model into ML Ops.
from_project_id
(project_id)Class method to create from an existing project id.
from_url
(url)Class method to initialize from a URL string.
Retrieve configuration parameters for the model.
predict
(X[, wait_for_autopilot])Make batch predictions using the present champion.
predict_proba
(X[, wait_for_autopilot])Calculate class probabilities using the present champion.
set_params
(**kwargs)Set or update configuration parameters for the model.
share
(emails)Share a project with other users.
- deploy(wait_for_autopilot=False, name=None)#
Deploy the model into ML Ops.
- Return type:
- Returns:
Deployment – Resulting ML Ops deployment
wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before deploying the model In non-notebook environments, fit() will always block until complete
name (str, optional, default=None) – Name for the deployment. If None, a name will be generated
- property dr_model: datarobot.Model#
DataRobot python client datarobot.Model object for the present champion.
- Returns:
datarobot.Model object associated with this drx model
- Return type:
datarobot.Model
- property dr_project: datarobot.Project#
DataRobot python client datarobot.Project object.
- Returns:
datarobot.Project object associated with this drx.Model
- Return type:
datarobot.Project
- fit(X, **kwargs)#
Fit clustering models using DataRobot.
Creates a new DataRobot project, uploads X to DataRobot and starts Autopilot in clustering mode. Exposes the present model with the highest silhouette score for making predictions or deployment.
- Parameters:
X (pandas.DataFrame or str) – Training dataset for anomaly detection models. If str, can be AI catalog dataset id or name (if unambiguous)
**kwargs – Additional optional fit-time parameters to pass to DataRobot i.e. ‘weights’
- Return type:
See also
DRConfig
Configuration object for DataRobot project and autopilot settings, also includes detailed examples of usage.
- classmethod from_project_id(project_id)#
Class method to create from an existing project id.
Initializes a new object from the provided project_id. Configuration parameters originally used to create the project and start Autopilot may not be recoverable.
- Parameters:
project_id (str, optional) – DataRobot id for the project from which to initialize the object
- Returns:
model – New AutopilotModel instance
- Return type:
AutopilotModel
Examples
>>> from datarobotx.models.autopilot import AutopilotModel >>> my_model = AutopilotModel.from_project_id('62f14505bab13ab73593d69e')
- classmethod from_url(url)#
Class method to initialize from a URL string.
Useful for copy and pasting between GUI and notebook environments
- Parameters:
url (str) – URL of a DataRobot GUI page related to the project of interest
- Returns:
model – The constructed AutopilotModel object
- Return type:
AutopilotModel
- get_params()#
Retrieve configuration parameters for the model.
Note that some parameters may be initialized or materialized server-side after creating a project or starting Autopilot. get_params() only returns the client-side parameters which will be (or were) passed to DataRobot.
- Returns:
config – Configuration object containing the parameters to be used with DataRobot
- Return type:
- predict(X, wait_for_autopilot=False, **kwargs)#
Make batch predictions using the present champion.
Predictions are calculated asynchronously - returns immediately but reinitializes the returned DataFrame with data once predictions are completed.
Predictions are made within the project containing the model using modeling workers. For real-time predictions, first deploy the model.
- Parameters:
X (pandas.DataFrame or str) – Dataset to be scored - target column can be included or omitted. If str, can be AI catalog dataset id or name (if unambiguous)
wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before making predictions In non-notebook environments, fit() will always block until complete
**kwargs (Any) – Other key word arguments to pass to the _predict function
- Returns:
Resulting predictions (contained in the column ‘predictions’) Returned immediately, updated automatically when results are completed.
- Return type:
- predict_proba(X, wait_for_autopilot=False, **kwargs)#
Calculate class probabilities using the present champion.
Only available for classifier and clustering models.
- Parameters:
X (pandas.DataFrame or str) – Dataset to be scored - target column can be included or omitted. If str, can be AI catalog dataset id or name (if unambiguous)
wait_for_autopilot (bool, optional, default=False) – If True, wait for autopilot to complete before making predictions In non-notebook environments, fit() will always block until complete
**kwargs (Any) – Other key word arguments to pass to the _predict function
- Returns:
Resulting predictions; probabilities for each label are contained in the column ‘class_{label}’; returned immediately, updated automatically when results are completed.
- Return type:
See also