spark_to_ai_catalog#

datarobotx.spark_to_ai_catalog(spark_df, name, max_rows=None)#

Upload Spark dataframe to AI Catalog.

Does not attempt to downsample. Will attempt to use the DataRobot ‘uploading of data files in stages’ feature flag if available to do a multipart upload. HTTP uploads of large files can intermittently fail without this feature flag.

Parameters:
  • spark_df (pyspark.sql.DataFrame) – Spark dataframe to be uploaded to AI catalog

  • name (str) – Name for the resulting AI Catalog entry

  • max_rows (int, optional) – Maximum number of rows from the dataframe to upload to AI catalog

Returns:

DataRobot dataset id of the resulting catalog entry

Return type:

dataset_id

See also

downsample_spark

Downsample spark dataframes if too large for DataRobot