enrich#

datarobotx.llm.enrich(question, using, default_cache=True, verbose=False)#

Enrich structured data with completions from an LLM or chain.

Convenience function for usage with pandas.DataFrame.apply():

  • Caches duplicative enrichment completions

  • Progress updating

  • Maps pandas row or column values to format provided question automatically

  • ML-oriented default contextual prompts and chains:
    • Attempts to infer and instruct around an appropriate completion type: numeric, categorical, date, or free-text

    • Prior completions included in successive prompts to encourage consistency (e.g. date formatting, categorical levels)

  • Customizable: interoperates with custom langchain Chains, Tools, LLMs

Parameters:
  • question (str) – Question to be answered to enrich the dataset. Provided as Python f-string that can be formatted with data from other fields in the dataframe row or column

  • using (langchain.llms.BaseLLM or langchain.chains.base.Chain) – Langchain abstraction to be used to answer the question; if a custom chain or tool is provided the question will be formatted for each row/column in the DataFrame and then passed as the first argument when calling the chain run() method

  • default_cache (bool, default = True) – If true, an InMemoryCache will be initialized and used for the lifecycle of the returned function; caching reduces API consumption from duplicative completions

  • verbose (bool, default = False) – If True, default enrichment LLMChains will be run with verbose output

Returns:

Function that can be used directly by pandas.DataFrame.apply() to perform the requested enrichment.

Return type:

Callable

Examples

>>> import pandas as pd
>>> import langchain
>>> from datarobotx.llm.chains.enrich import enrich
>>> llm = langchain.llms.OpenAI(model_name="text-davinci-003")
>>> df = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/' +
...                  '10K_2007_to_2011_Lending_Club_Loans_v2_mod_80.csv')
>>> df_test = df[:5].copy(deep=True)
>>> df_test['f500_or_gov'] = df_test.apply(enrich('Is "{emp_title}" a Fortune 500 company or ' +
...                                               'large government organization (Y/N)?', llm),
...                                        axis=1)