Integration with MLflow#
MLflow is a swiss army knife of tools that are helpful for data scientists to track their work (through experiments), deploy models (through MLflow serialization) and store models (model registry).
drx
includes a datarobotx.mlflow
module that implements a MLflow model flavor that helps simplify working with DataRobot models in MLflow.
Usage#
AutoLogging#
You can enable auto logging of DataRobot models when using drx
. Each blueprint executed by DataRobot will be logged as separate MLflow run.
import mlflow
import pandas as pd
import datarobotx as drx
from datarobotx import mlflow as drx_mlflow
base_data = pd.read_csv(
"https://s3.amazonaws.com/datarobot_public_datasets/10K_2007_to_2011_Lending_Club_Loans_v2_mod_80.csv"
).assign(
earliest_cr_line=lambda df: pd.to_datetime(
df.earliest_cr_line # Fixing datetimes from original file
)
)
drx_mlflow.autolog()
new_model = drx.AutoMLModel(name="ML_Flow Test Model")
new_model.fit(base_data, target="is_bad")
The results in MLflow can then be accessed via the tracking UI or via the client API. Note be careful about the name space of your imports. We find it easiest to have drx_mlflow
and mlflow
so you can go back and forth.
By default drx
will log up to ten models from the DataRobot leaderboard. In addition, drx
will log artifacts for the top performing model. You can change these defaults by using the parameters for drx.mlflow.autolog
Loading and Running Predictions#
You can also export DataRobot models into the MLflow model registry for use in any environment that supports MLflow execution.
model = drx.AutoMLModel.from_url(
"https://app.datarobot.com/projects/64232655f1cbcd7e4a686a45/models"
)
model_info = drx_mlflow.log_model(
model,
registered_model_name="ml-flow-datarobot-registered-model",
artifact_path="model",
)
model_version = mlflow.register_model(
f"""runs:/{model_info.run_id}/model""", "ml-flow-datarobot-registered-model"
)
Now that the model is saved you can use it in to make predictions:
model_name = "ml-flow-datarobot-registered-model"
model_version = 1
mlflow_model = drx_mlflow.load_model(model_uri=f"models:/{model_name}/{model_version}")
mlflow_model.predict(base_data.sample(500))
You can also make predictions as a pyfunc
model.
model_name = "ml-flow-datarobot-registered-model"
model_version = 1
model = mlflow.pyfunc.load_model(model_uri=f"models:/{model_name}/{model_version}")
model.predict(base_data.sample(500))
Note that when you call predict
, you will need to have instantiated an instance of drx.Context
so that drx
can access DataRobot API credentials. That is as simple as having called drx.Context
once in your session.
drx.Context(token='foo', endpoint='https://app.datarobot.com/api/v2')
Filling in historical runs#
Sometimes, you may have already run DataRobot and want to simply log the runs back in MLflow. This can be done with the following helper function:
model = drx.AutoMLModel.from_url(
"https://app.datarobot.com/projects/64232655f1cbcd7e4a686a45/models"
)
drx_mlflow.log_runs_from_model(model)