Relationship#
- class datarobotx.Relationship(X, keys, temporal_key=None, feature_windows=None, dataset_name=<factory>)#
Secondary dataset relationship definition.
Can be used to configure FeatureDiscoveryModel
Note that Relationship can only be used for relationships between the primary and secondary datasets. The DataRobot SDK should be used to define more complex relationships.
- Parameters:
X (pd.DataFrame or str) – The primary dataset to use in feature discovery
keys (str or tuple[str] or list[tuple[str]]) – Column name(s) of the feature(s) to be used for a join If a scalar string, key is assumed to be the same in both primary and secondary dataset. If a tuple, tuple maps the single join key in the primary dataset to the corresponding key in the secondary dataset. If a list of tuples, each tuple maps a key in the primary dataset to its corresponding key in the secondary dataset.
temporal_key (str, optional) – The column name to use in a lookback window
feature_windows (tuple or list[tuple], optional) –
A tuple with the following three elements to govern feature discovery: (start, end, unit) start: int how far back to look to aggregate features end: int stopping point for aggregation. unit: str unit of time to use for aggregation e.g. (‘MILLISECOND’, ‘SECOND’, ‘MINUTE’, ‘HOUR’, ‘DAY’, ‘WEEK’,’MONTH’,’QUARTER’, ‘YEAR’)
Example: (-14, -7, ‘DAY’) will created aggregated features from 14 days ago until 7 days ago
Can also be provided as a list of multiple feature discovery window tuples.
dataset_name (str, optional) – The name of the dataset in feature discovery relationship graph, will be automatically generated if omitted.
Attributes:
dataset_name
feature_windows
keys
temporal_key
X