ModelingAutoMLConfig#

class datarobotx.ModelingAutoMLConfig(prepare_model_for_deployment=None, consider_blenders_in_recommendation=None, run_leakage_removed_feature_list=None, sample_step_pct=None, blend_best_models=None, blueprint_threshold=None, scoring_code_only=None, seed=None, allowed_pairwise_interaction_groups=None, stop_words=None, min_secondary_validation_model_count=None, autopilot_cluster_list=None, rate_top_pct_threshold=None, incremental_learning_only_mode=None, incremental_learning_on_best_model=None)#

AutoML additional modeling options.

Parameters that default to ‘None’ (or are omitted by the user) are overridden to server-side defaults at runtime. Consult the DataRobot REST API and GUI documentation for additional information on each parameter.

Parameters:
  • prepare_model_for_deployment (bool) – Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning ‘RECOMMENDED FOR DEPLOYMENT’ label.

  • consider_blenders_in_recommendation (bool) – Include blenders when selecting a model to prepare for deployment in an Autopilot Run. This option is not supported in SHAP-only mode or for multilabel projects.

  • run_leakage_removed_feature_list (bool) – Run Autopilot on Leakage Removed feature list (if exists).

  • sample_step_pct (float) – A float between 0 and 100 indicating the desired percentage of data to sample when training models in comprehensive Autopilot. Note: this only supported for comprehensive Autopilot and the specified value may be lowered in order to be compatible with the project’s dataset and partition settings.

  • blend_best_models (bool) – Blend best models during Autopilot run. This option is not supported in SHAP-only mode or for multilabel projects.

  • blueprint_threshold (int) – The runtime (in hours) which if exceeded will exclude a model from autopilot runs.

  • scoring_code_only (bool) – Keep only models that can be converted to scorable java code during Autopilot run.

  • seed (int) – A seed to use for randomization.

  • allowed_pairwise_interaction_groups (list of list of str) – For GAM models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [[‘A’, ‘B’, ‘C’], [‘C’, ‘D’]] then GAM models will allow interactions between columns AxB, BxC, AxC, CxD. All others (AxD, BxD) will not be considered. If not specified - all possible interactions will be considered by model.

  • stop_words (list of str) – A list of stop words to be used for text blueprints. Note: stop_words=True must be set in the blueprint preprocessing parameters for this list of stop words to actually be used during preprocessing.

  • min_secondary_validation_model_count (int) – Compute ‘All backtest’ scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.

  • autopilot_cluster_list (list of int) – A list of integers where each value will be used as the number of clusters in Autopilot model(s) for unsupervised clustering projects. Cannot be specified unless unsupervisedMode is true and unsupervisedType is set to clustering.

  • rate_top_pct_threshold (float) – The percentage threshold between 0.1 and 50 for specifying the Rate@Top% metric.

  • incremental_learning_only_mode (bool) – Keep only models that support incremental learning during Autopilot run.

  • incremental_learning_on_best_model (bool) – Run incremental learning on the best model during Autopilot run.

See also

DRConfig

Configuration object for DataRobot project and autopilot settings, also includes detailed examples of usage

Attributes:

allowed_pairwise_interaction_groups

For GAM models - specify groups of columns for which pairwise interactions will be allowed.

autopilot_cluster_list

A list of integers where each value will be used as the number of clusters in Autopilot model(s) for unsupervised clustering projects.

blend_best_models

Blend best models during Autopilot run.

blueprint_threshold

The runtime (in hours) which if exceeded will exclude a model from autopilot runs.

consider_blenders_in_recommendation

Include blenders when selecting a model to prepare for deployment in an Autopilot Run.

incremental_learning_on_best_model

Run incremental learning on best model during autopilot

incremental_learning_only_mode

Keep only models that can do incremental learning

min_secondary_validation_model_count

Compute 'All backtest' scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.

prepare_model_for_deployment

Prepare model for deployment during Autopilot run.

rate_top_pct_threshold

The percentage threshold between 0.1 and 50 for specifying the Rate@Top% metric.

run_leakage_removed_feature_list

Run Autopilot on Leakage Removed feature list (if exists).

sample_step_pct

A float between 0 and 100 indicating the desired percentage of data to sample when training models in comprehensive Autopilot.

scoring_code_only

Keep only models that can be converted to scorable java code during Autopilot run.

seed

A seed to use for randomization.

stop_words

A list of stop words to be used for text blueprints.

Inherited methods:

keys()

rtype:

Collection[str]

to_dict()

Return configuration as a dict.

property allowed_pairwise_interaction_groups: List[List[str]]#

For GAM models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [[‘A’, ‘B’, ‘C’], [‘C’, ‘D’]] then GAM models will allow interactions between columns AxB, BxC, AxC, CxD. All others (AxD, BxD) will not be considered. If not specified - all possible interactions will be considered by model.

Notes

allowed_pairwise_interaction_groups : list of list of str

property autopilot_cluster_list: List[int]#

A list of integers where each value will be used as the number of clusters in Autopilot model(s) for unsupervised clustering projects. Cannot be specified unless unsupervisedMode is true and unsupervisedType is set to clustering.

Notes

autopilot_cluster_list : list of int

property blend_best_models: bool#

Blend best models during Autopilot run. This option is not supported in SHAP-only mode or for multilabel projects.

Notes

blend_best_models : bool

property blueprint_threshold: int#

The runtime (in hours) which if exceeded will exclude a model from autopilot runs.

Notes

blueprint_threshold : int

property consider_blenders_in_recommendation: bool#

Include blenders when selecting a model to prepare for deployment in an Autopilot Run. This option is not supported in SHAP-only mode or for multilabel projects.

Notes

consider_blenders_in_recommendation : bool

property incremental_learning_on_best_model: bool#

Run incremental learning on best model during autopilot

property incremental_learning_only_mode: bool#

Keep only models that can do incremental learning

property min_secondary_validation_model_count: int#

Compute ‘All backtest’ scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.

Notes

min_secondary_validation_model_count : int

property prepare_model_for_deployment: bool#

Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning ‘RECOMMENDED FOR DEPLOYMENT’ label.

Notes

prepare_model_for_deployment : bool

property rate_top_pct_threshold: float#

The percentage threshold between 0.1 and 50 for specifying the Rate@Top% metric.

Notes

rate_top_pct_threshold : float

property run_leakage_removed_feature_list: bool#

Run Autopilot on Leakage Removed feature list (if exists).

Notes

run_leakage_removed_feature_list : bool

property sample_step_pct: float#

A float between 0 and 100 indicating the desired percentage of data to sample when training models in comprehensive Autopilot. Note: this only supported for comprehensive Autopilot and the specified value may be lowered in order to be compatible with the project’s dataset and partition settings.

Notes

sample_step_pct : float

property scoring_code_only: bool#

Keep only models that can be converted to scorable java code during Autopilot run.

Notes

scoring_code_only : bool

property seed: int#

A seed to use for randomization.

Notes

seed : int

property stop_words: List[str]#

A list of stop words to be used for text blueprints. Note: stop_words=True must be set in the blueprint preprocessing parameters for this list of stop words to actually be used during preprocessing.

Notes

stop_words : list of str

to_dict()#

Return configuration as a dict.

Return type:

Dict[str, Any]