Integration with MLflow#

../_images/mlflow.png

MLflow is a swiss army knife of tools that are helpful for data scientists to track their work (through experiments), deploy models (through MLflow serialization) and store models (model registry).

drx includes a datarobotx.mlflow module that implements a MLflow model flavor that helps simplify working with DataRobot models in MLflow.

Usage#

AutoLogging#

You can enable auto logging of DataRobot models when using drx. Each blueprint executed by DataRobot will be logged as separate MLflow run.

import mlflow
import pandas as pd

import datarobotx as drx
from datarobotx import mlflow as drx_mlflow

base_data = pd.read_csv(
    "https://s3.amazonaws.com/datarobot_public_datasets/10K_2007_to_2011_Lending_Club_Loans_v2_mod_80.csv"
).assign(
    earliest_cr_line=lambda df: pd.to_datetime(
        df.earliest_cr_line # Fixing datetimes from original file
    )  
)

drx_mlflow.autolog()
new_model = drx.AutoMLModel(name="ML_Flow Test Model")
new_model.fit(base_data, target="is_bad")

The results in MLflow can then be accessed via the tracking UI or via the client API. Note be careful about the name space of your imports. We find it easiest to have drx_mlflow and mlflow so you can go back and forth.

By default drx will log up to ten models from the DataRobot leaderboard. In addition, drx will log artifacts for the top performing model. You can change these defaults by using the parameters for drx.mlflow.autolog

Loading and Running Predictions#

You can also export DataRobot models into the MLflow model registry for use in any environment that supports MLflow execution.

model = drx.AutoMLModel.from_url(
    "https://app.datarobot.com/projects/64232655f1cbcd7e4a686a45/models"
)
model_info = drx_mlflow.log_model(
    model,
    registered_model_name="ml-flow-datarobot-registered-model",
    artifact_path="model",
)
model_version = mlflow.register_model(
    f"""runs:/{model_info.run_id}/model""", "ml-flow-datarobot-registered-model"
)

Now that the model is saved you can use it in to make predictions:

model_name = "ml-flow-datarobot-registered-model"

model_version = 1

mlflow_model = drx_mlflow.load_model(model_uri=f"models:/{model_name}/{model_version}")
mlflow_model.predict(base_data.sample(500))

You can also make predictions as a pyfunc model.

model_name = "ml-flow-datarobot-registered-model"

model_version = 1

model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version}")

model.predict(base_data.sample(500))

Note that when you call predict, you will need to have instantiated an instance of drx.Context so that drx can access DataRobot API credentials. That is as simple as having called drx.Context once in your session.

drx.Context(token='foo', endpoint='https://app.datarobot.com/api/v2') 

Filling in historical runs#

Sometimes, you may have already run DataRobot and want to simply log the runs back in MLflow. This can be done with the following helper function:

model = drx.AutoMLModel.from_url(
    "https://app.datarobot.com/projects/64232655f1cbcd7e4a686a45/models"
)

drx_mlflow.log_runs_from_model(model)