ModelingAutoMLConfig#
- class datarobotx.ModelingAutoMLConfig(prepare_model_for_deployment=None, consider_blenders_in_recommendation=None, run_leakage_removed_feature_list=None, sample_step_pct=None, blend_best_models=None, blueprint_threshold=None, scoring_code_only=None, seed=None, allowed_pairwise_interaction_groups=None, stop_words=None, min_secondary_validation_model_count=None, autopilot_cluster_list=None, rate_top_pct_threshold=None, incremental_learning_only_mode=None, incremental_learning_on_best_model=None, number_of_incremental_learning_iterations_before_best_model_selection=None)#
AutoML additional modeling options.
Parameters that default to ‘None’ (or are omitted by the user) are overridden to server-side defaults at runtime. Consult the DataRobot REST API and GUI documentation for additional information on each parameter.
- Parameters:
prepare_model_for_deployment (bool) – Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning ‘RECOMMENDED FOR DEPLOYMENT’ label.
consider_blenders_in_recommendation (bool) – Include blenders when selecting a model to prepare for deployment in an Autopilot Run. This option is not supported in SHAP-only mode or for multilabel projects.
run_leakage_removed_feature_list (bool) – Run Autopilot on Leakage Removed feature list (if exists).
sample_step_pct (float) – A float between 0 and 100 indicating the desired percentage of data to sample when training models in comprehensive Autopilot. Note: this only supported for comprehensive Autopilot and the specified value may be lowered in order to be compatible with the project’s dataset and partition settings.
blend_best_models (bool) – Blend best models during Autopilot run. This option is not supported in SHAP-only mode or for multilabel projects.
blueprint_threshold (int) – The runtime (in hours) which if exceeded will exclude a model from autopilot runs.
scoring_code_only (bool) – Keep only models that can be converted to scorable java code during Autopilot run.
seed (int) – A seed to use for randomization.
allowed_pairwise_interaction_groups (list of list of str) – For GAM models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [[‘A’, ‘B’, ‘C’], [‘C’, ‘D’]] then GAM models will allow interactions between columns AxB, BxC, AxC, CxD. All others (AxD, BxD) will not be considered. If not specified - all possible interactions will be considered by model.
stop_words (list of str) – A list of stop words to be used for text blueprints. Note:
stop_words=True
must be set in the blueprint preprocessing parameters for this list of stop words to actually be used during preprocessing.min_secondary_validation_model_count (int) – Compute ‘All backtest’ scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.
autopilot_cluster_list (list of int) – A list of integers where each value will be used as the number of clusters in Autopilot model(s) for unsupervised clustering projects. Cannot be specified unless unsupervisedMode is true and unsupervisedType is set to clustering.
rate_top_pct_threshold (float) – The percentage threshold between 0.1 and 50 for specifying the Rate@Top% metric.
incremental_learning_only_mode (bool) – Keep only models that support incremental learning during Autopilot run.
incremental_learning_on_best_model (bool) – Run incremental learning on the best model during Autopilot run.
number_of_incremental_learning_iterations_before_best_model_selection (int) – Number of iterations top 5 models complete prior to best model selection.
See also
DRConfig
Configuration object for DataRobot project and autopilot settings, also includes detailed examples of usage
Attributes:
For GAM models - specify groups of columns for which pairwise interactions will be allowed.
A list of integers where each value will be used as the number of clusters in Autopilot model(s) for unsupervised clustering projects.
Blend best models during Autopilot run.
The runtime (in hours) which if exceeded will exclude a model from autopilot runs.
Include blenders when selecting a model to prepare for deployment in an Autopilot Run.
Run incremental learning on best model during autopilot
Keep only models that can do incremental learning
Compute 'All backtest' scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.
number_of_incremental_learning_iterations_before_best_model_selection
Prepare model for deployment during Autopilot run.
The percentage threshold between 0.1 and 50 for specifying the Rate@Top% metric.
Run Autopilot on Leakage Removed feature list (if exists).
A float between 0 and 100 indicating the desired percentage of data to sample when training models in comprehensive Autopilot.
Keep only models that can be converted to scorable java code during Autopilot run.
A seed to use for randomization.
A list of stop words to be used for text blueprints.
Inherited methods:
keys
()- rtype:
to_dict
()Return configuration as a dict.
- property allowed_pairwise_interaction_groups: List[List[str]]#
For GAM models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [[‘A’, ‘B’, ‘C’], [‘C’, ‘D’]] then GAM models will allow interactions between columns AxB, BxC, AxC, CxD. All others (AxD, BxD) will not be considered. If not specified - all possible interactions will be considered by model.
Notes
allowed_pairwise_interaction_groups : list of list of str
- property autopilot_cluster_list: List[int]#
A list of integers where each value will be used as the number of clusters in Autopilot model(s) for unsupervised clustering projects. Cannot be specified unless unsupervisedMode is true and unsupervisedType is set to clustering.
Notes
autopilot_cluster_list : list of int
- property blend_best_models: bool#
Blend best models during Autopilot run. This option is not supported in SHAP-only mode or for multilabel projects.
Notes
blend_best_models : bool
- property blueprint_threshold: int#
The runtime (in hours) which if exceeded will exclude a model from autopilot runs.
Notes
blueprint_threshold : int
- property consider_blenders_in_recommendation: bool#
Include blenders when selecting a model to prepare for deployment in an Autopilot Run. This option is not supported in SHAP-only mode or for multilabel projects.
Notes
consider_blenders_in_recommendation : bool
- property incremental_learning_on_best_model: bool#
Run incremental learning on best model during autopilot
- property min_secondary_validation_model_count: int#
Compute ‘All backtest’ scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.
Notes
min_secondary_validation_model_count : int
- property prepare_model_for_deployment: bool#
Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning ‘RECOMMENDED FOR DEPLOYMENT’ label.
Notes
prepare_model_for_deployment : bool
- property rate_top_pct_threshold: float#
The percentage threshold between 0.1 and 50 for specifying the Rate@Top% metric.
Notes
rate_top_pct_threshold : float
- property run_leakage_removed_feature_list: bool#
Run Autopilot on Leakage Removed feature list (if exists).
Notes
run_leakage_removed_feature_list : bool
- property sample_step_pct: float#
A float between 0 and 100 indicating the desired percentage of data to sample when training models in comprehensive Autopilot. Note: this only supported for comprehensive Autopilot and the specified value may be lowered in order to be compatible with the project’s dataset and partition settings.
Notes
sample_step_pct : float
- property scoring_code_only: bool#
Keep only models that can be converted to scorable java code during Autopilot run.
Notes
scoring_code_only : bool