Convert Blueprints to Open Source#

../_images/convertblueprints.png

Convert exported DataRobot Blueprint JSON representation to SKlearn compatible pipelines.

Important

A blueprint conversion is not a direct representation of how a DataRobot blueprint would train and predict inside the platform. It is designed to be a similar representation of the blueprint using open source machine learning packages. The accuracy therefore may be significantly different.

Usage#

DataRobot models are constructed from our large repository of blueprints. For each new project a large number of uniquely constructed blueprints are generated. In an effort to improve the transparency of the DataRobot product and to help users better understand the models, we are adding a BlueprintConverter functionality to the DRX package as a prototype.

The BlueprintConverter can be used to convert a DataRobot blueprint, represented by json, into a code based SKLearn open source representation using the sklearn.Pipeline structure.

This is currently a prototype

Please see the “Limitations” section for further details on the limitations of a converted blueprint.

Example Output#

The output of the BlueprintConverter will return a string of code that is formatted using the black python API. A result for a simple conversion could look something like the following.

The convert function will also return the following helper logic that has been removed from this example

  • Required import statements

  • Code for reading data into a dataframe

  • Code for identify and tagging Categorical columns in their import data

preprocessor = ColumnTransformer(
    [
        (
            "NUM_PNI2_RST",
            Pipeline(
                [
                    ("PNI2", sklearn.impute.SimpleImputer()),
                    ("RST", sklearn.preprocessing.StandardScaler()),
                ]
            ),
            make_column_selector(dtype_include=numpy.number),
        ),
    ]
)

estimator = Pipeline(
    [("preprocessor", preprocessor), ("LENETCDWC", sklearn.linear_model.ElasticNet())]
)

The pipeline can then be used to train and predict on the resulting estimator, or the user can choose to add, tweak, or modify the logic to alter the behavior.

Converting an existing model to an open source code representation#

The blueprint from an existing model can be retrieved and converted to an open source code representation, as shown in the following example.

from datarobot import Model
from datarobotx.openblueprints.blueprint_converter import BlueprintConverter

datarobot_model = Model.get(project_id="PROJECT_ID", model_id="MODEL_ID")
blueprint_json = datarobot_model.get_model_blueprint_json()
open_source_blueprint = BlueprintConverter.convert(blueprint_json=blueprint_json)

with open("blueprint_code.py", "w", encoding="UTF-8") as blueprint_file:
    blueprint_file.write(open_source_blueprint)

Important

It is important to use Model.get(project_id="PROJECT_ID", id="MODEL_ID")`` when retrieving a model for blueprint conversion. Calling Model(project_id=”PROJECT_ID”, id=”MODEL_ID”)` will not populate all of the required blueprint attributes for the model.

Converting an untrained blueprint directly to an open source code representation#

Alternatively an open source code representation of a blueprint can be converted directly from an existing blueprint ID that has not been trained. The blueprint itself can be queried directly to retrieve the JSON. The ID of the blueprint can be found through the UI or by retrieving a list of blueprints through the API and identifying the corresponding ID.

from datarobot import Blueprint
from datarobotx.openblueprints.blueprint_converter import BlueprintConverter

datarobot_blueprint = Blueprint(id="MODEL_ID")
blueprint_json = datarobot_blueprint.get_json()
open_source_blueprint = BlueprintConverter.convert(blueprint_json=blueprint_json)

with open("blueprint_code.py", "w", encoding="UTF-8") as blueprint_file:
    blueprint_file.write(open_source_blueprint)

Limitations#

While the BlueprintConverter will make every effort to successfully convert a JSON blueprint representation into a working and runnable sklearn.Pipeline there are some considerations to be aware of. The Blueprints themselves are represented as Directed Acyclic Graphs for processing, which is not directly compatible with how sklearn.Pipeline objects require their data input handling. To work around this, multiple ColumnTransformer pipelines are constructed and stitched together to recreate a similar set of pre-processing conditions.

A few of the key limitations to be aware of at this time:

  • Blueprint components must have a reasonably equivalent open source representation to be converted. If a representation has not yet been defined or determined by DataRobot, the blueprint stage will be converted as a "passthrough" stage. There will still be a formatted sklearn.compose.ColumnTransformer constructed for every stage present in the blueprint, but "passthrough" stages would result in different pre-processing or modeling compared to an actual blueprint.

  • Time Series (AutoTS) will convert, but are not supported. Time Series pre-processing, forecast distances, and feature derivation windows are not well supported through open source packages at this time.

  • Pre-processing that occurs outside of the blueprints would need to be manually reconstructed by the user. These tasks include things such as dataprep, calendars, and column transforms a user might perform in the DataRobot interface. Support for this type of functionality could be expanded in a future update.

  • Currently hyperparameters are not directly converted. The blueprints will have a list of hyperparameters used at each step in the JSON representation, but mappings between internal datarobot parameters and open source equivalent parameters have not been established at this time. The exported code can easily be modified by the user to set any desired training or processing hyperparameters they desire.

API Reference#