About#

Important

drx is an unofficial, experimental library; interfaces will change. We invite and encourage you to use and provide feedback, but ask that you be mindful of the experimental, and unoffical nature when choosing to use on a project.

Project goals#

drx intends to explore and prototype a programmatic DataRobot experience that is:

  1. Declarative and simple by default

    • Streamlines common workflows

    • Uses broadly familiar syntax and verbiage where possible

  2. Unobtrusively customizable

    • Allows default behaviors and configuration to be easily overridden…

    • …but not at the expense of complicating the common experience

  3. Experimental

    • Accelerates user experimentation

    • Offers new abstractions and concepts for interacting with DataRobot

Configuration of abstractions#

DataRobot provides dozens of settings and configuration options that govern execution behavior. drx aims to provide a streamlined default experience without compromising on the ability to customize, configure or override.

To this end, drx abstractions are typically structured in the following manner:

  1. The most important configuration parameters (typically limited to ~7) are exposed and documented in the abstraction’s constructor as keyword arguments.

  2. Wherever possible these keyword arguments are optional.

  3. Base models for core problem types can be configured using the DRConfig configuration object class. This class:

    • Enables easy, inline discovery and specification of the many DataRobot parameters using autocomplete, docstrings, and nesting of parameters by category

    • Preserves an equally interchangeable flat dictionary representation.

    • Maintains the same names and descriptions used in the DataRobot REST API for all parameters that get passed to DR.

    See the reference documentation for examples of usage.

  4. When a parameter name is known in advance, it can be passed directly to the abstraction’s constructor (e.g. AutoMLModel()) or after construction using the set_params() method:

    • As additional optional keyword arguments: AutoMLModel(project_description='foo') or set_params(project_description='foo')

    • Within a dictionary to be unpacked: AutoMLModel(**my_dict_of_params) or set_params(**my_dict_of_params)

Examples#

Configuration objects can be interchanged with dictionaries, trading-off ease of discovery (e.g. autocomplete + documentation capabilities) for succinctness.

# -------------------------------------------------------------------
# Configuration discovered through sequential, shift-tab autocomplete
# -------------------------------------------------------------------
config_1 = DRConfig() # or construct by calling get_params() on an abstraction
config_1.Modeling.AutoML.blend_best_models = False

# ----------------------------------------------------------
# Configuration discovered by reading the HTML documentation
# ----------------------------------------------------------
config_2 = drx.ModelingAutoMLConfig(blend_best_models=False)

# -----------------------------------------------------------
# Direct configuration (e.g. parameter name known in advance)
# -----------------------------------------------------------
config_3 = {'blend_best_models': False}

# ----------
# Equivalent
# ----------
model_1 = AutoMLModel(**config_1)
model_2 = AutoMLModel(**config_2)
model_3 = AutoMLModel(**config_3)
model_4 = AutoMLModel(blend_best_models=False)

In general, parameters have unique names which allows for use of the flat dictionary representation seen with config_3 and config_4. If parameter names become ambiguous in the future, configurations using duplicatively named parameters will likely need to use the nested representations.

Async execution#

drx leverages the python standard asyncio and concurrent.futures libraries to initiate and monitor execution of long-running tasks concurrently when running in an interactive notebook. This is done to retain interactivity during time consuming computations, allowing users to explore other ideas or hypotheses from the same notebook while waiting for a job to finish.

In a notebook, methods that return a data such as predict(), predict_proba() return a drx.FutureDataFrame object immediately without blocking the notebook. The notebook will only block if access on the underlying attributes or data is attempted.

In scripts, the default behavior is serial execution. Each command will run to completion before the next command is executed.

Notebook

Script

Concurrent execution

Default behavior

Not available

Serial execution

Upon request

Default behavior

Waiting for fit() to complete in a notebook before predicting or deploying#


model.fit(df, target='your_target_col')

# This predict() will make predictions as soon as a trained model is available
model.predict(score)

# This predict() call blocks the notebook until autopilot has completed
model.predict(score, wait_for_autopilot=True)

# When executed from a stand-alone script, both will block until fit() is complete

FAQ#

How is drx different from the existing python API experience?#

The existing python client is extremely flexible, powerful, and configurable. However, certain common workflows may require multiple intermediate steps and the learning curve can be steep for new users.

drx aims to provide a streamlined experience for the most common workflows, but also offer new, experimental high-level abstractions.

Will drx be incorporated into the main python API?#

drx uses the same code repository, CI/CD pipeline, unit testing, and code hygiene standards as the python API and there is code shared between the projects. Installing drx will install the main DataRobot Python client: Installing the main Python client will not install drx.

Certain prototypes and concepts from drx may eventually be included in the main python API if they prove helpful to users.

Is drx the same as the “Idiomatic Python SDK” project?#

drx is focused on exploring, prototyping and validating longer-term experimental concepts and provides no guarantees of backwards-compatibility or feature completeness.

Can I use DrX in my code?#

drx can be installed and used like any other python package (pip install datarobotx). Bear in mind that the package relies on features that are in public and private preview and thus some functionality may not work if those features are not available or not enabled. drx functionality is subject to change and there is no guarantee of backward compatibility.

Certain modules within drx may be documented as experimental and thus should not be used in production critical code.

Can I use drx in DataRobot Notebooks?#

Yes, drx can be installed and used in DataRobot Notebooks. You may see the package behave differently in DR Notebooks from how it does in other implementations of notebooks (e.g. Jupyter, Colab, etc.) due to differences in the underlying execution environment. These should be reduced as DR Notebooks and drx continue to develop.

How do I contribute to drx or share feedback?#

Feedback and feature requests are welcome! You can ask questions or leave feedback in the DataRobot Community. Please use the drx label.

DataRobot Employees only: you can also reach us at #drx-in-sdk slack channel. The drx codebase is hosted in the public-api-client repo but open to contributions from any and all DataRobot staff.

Does drx collect any customer information?#

drx does not collect any customer code or data as part of the analytics that DataRobot captures. Information about which methods in drx or datarobot are called is captured, but this can be turned off by using the environment variable DATAROBOT_ENABLE_CONSUMER_TRACKING=0 or by adding enable_api_consumer_tracking: no into drconfig.yaml.