About#
Important
drx is an unofficial, experimental library; interfaces will change. We invite and encourage you to use and provide feedback, but ask that you be mindful of the experimental, and unoffical nature when choosing to use on a project.
Project goals#
drx
intends to explore and prototype a programmatic DataRobot experience that is:
Declarative and simple by default
Streamlines common workflows
Uses broadly familiar syntax and verbiage where possible
Unobtrusively customizable
Allows default behaviors and configuration to be easily overridden…
…but not at the expense of complicating the common experience
Experimental
Accelerates user experimentation
Offers new abstractions and concepts for interacting with DataRobot
Configuration of abstractions#
DataRobot provides dozens of settings and configuration options that govern
execution behavior. drx
aims to provide a streamlined default experience without
compromising on the ability to customize, configure or override.
To this end, drx
abstractions are typically structured in the following manner:
The most important configuration parameters (typically limited to ~7) are exposed and documented in the abstraction’s constructor as keyword arguments.
Wherever possible these keyword arguments are optional.
Base models for core problem types can be configured using the
DRConfig
configuration object class. This class:Enables easy, inline discovery and specification of the many DataRobot parameters using autocomplete, docstrings, and nesting of parameters by category
Preserves an equally interchangeable flat dictionary representation.
Maintains the same names and descriptions used in the DataRobot REST API for all parameters that get passed to DR.
See the reference documentation for examples of usage.
When a parameter name is known in advance, it can be passed directly to the abstraction’s constructor (e.g.
AutoMLModel()
) or after construction using theset_params()
method:As additional optional keyword arguments:
AutoMLModel(project_description='foo')
orset_params(project_description='foo')
Within a dictionary to be unpacked:
AutoMLModel(**my_dict_of_params)
orset_params(**my_dict_of_params)
Examples#
Configuration objects can be interchanged with dictionaries, trading-off ease of discovery (e.g. autocomplete + documentation capabilities) for succinctness.
# -------------------------------------------------------------------
# Configuration discovered through sequential, shift-tab autocomplete
# -------------------------------------------------------------------
config_1 = DRConfig() # or construct by calling get_params() on an abstraction
config_1.Modeling.AutoML.blend_best_models = False
# ----------------------------------------------------------
# Configuration discovered by reading the HTML documentation
# ----------------------------------------------------------
config_2 = drx.ModelingAutoMLConfig(blend_best_models=False)
# -----------------------------------------------------------
# Direct configuration (e.g. parameter name known in advance)
# -----------------------------------------------------------
config_3 = {'blend_best_models': False}
# ----------
# Equivalent
# ----------
model_1 = AutoMLModel(**config_1)
model_2 = AutoMLModel(**config_2)
model_3 = AutoMLModel(**config_3)
model_4 = AutoMLModel(blend_best_models=False)
In general, parameters have unique names which allows for use of the flat
dictionary representation seen with config_3
and config_4
. If parameter
names become ambiguous in the future, configurations using duplicatively
named parameters will likely need to use the nested representations.
Async execution#
drx
leverages the python standard asyncio
and
concurrent.futures
libraries to initiate and monitor
execution of long-running tasks concurrently when running in an interactive notebook.
This is done to retain interactivity during time consuming computations, allowing users
to explore other ideas or hypotheses from the same notebook while waiting for a job to finish.
In a notebook, methods that return a data such as predict()
, predict_proba()
return a drx.FutureDataFrame
object immediately without blocking the notebook.
The notebook will only block if access on the underlying attributes
or data is attempted.
In scripts, the default behavior is serial execution. Each command will run to completion before the next command is executed.
Notebook |
Script |
|
---|---|---|
Concurrent execution |
Default behavior |
Not available |
Serial execution |
Upon request |
Default behavior |
Waiting for fit() to complete in a notebook before predicting or deploying#
model.fit(df, target='your_target_col')
# This predict() will make predictions as soon as a trained model is available
model.predict(score)
# This predict() call blocks the notebook until autopilot has completed
model.predict(score, wait_for_autopilot=True)
# When executed from a stand-alone script, both will block until fit() is complete
FAQ#
How is drx different from the existing python API experience?#
The existing python client is extremely flexible, powerful, and configurable. However, certain common workflows may require multiple intermediate steps and the learning curve can be steep for new users.
drx
aims to provide a streamlined experience for the most common workflows,
but also offer new, experimental high-level abstractions.
Will drx be incorporated into the main python API?#
drx
uses the same code repository, CI/CD pipeline, unit testing, and code hygiene standards as the python API and there is code shared between the projects. Installing drx
 will install the main DataRobot Python client: Installing the main Python client will not install drx
.
Certain prototypes and concepts from drx
may eventually be included in the main python API if they prove helpful to users.
Is drx the same as the “Idiomatic Python SDK” project?#
drx
is focused on exploring, prototyping and validating longer-term experimental
concepts and provides no guarantees of backwards-compatibility or feature completeness.
Can I use DrX in my code?#
drx
can be installed and used like any other python package (pip install datarobotx
).
Bear in mind that the package relies on features that are in public and private preview and thus some functionality may not work
if those features are not available or not enabled. drx
functionality is subject to change and there is no guarantee of backward compatibility.
Certain modules within drx
may be documented as experimental and thus should not be used in production critical code.
Can I use drx in DataRobot Notebooks?#
Yes, drx
can be installed and used in DataRobot Notebooks. You may see the package
behave differently in DR Notebooks from how it does in other implementations of notebooks
(e.g. Jupyter, Colab, etc.) due to differences in the underlying execution environment.
These should be reduced as DR Notebooks and drx
continue to develop.
Does drx collect any customer information?#
drx
does not collect any customer code or data as part of the analytics that DataRobot captures.
Information about which methods in drx
or datarobot
are called is captured, but this can be turned off by using
the environment variable DATAROBOT_ENABLE_CONSUMER_TRACKING=0
or by adding enable_api_consumer_tracking: no
into drconfig.yaml
.