Catalog System

Overview

The catalog system manages metadata for database tables, columns, and business rules.

Catalog Store

class openchatbi.catalog.catalog_store.CatalogStore[source]

Bases: ABC

Abstract base class defining the storage interface for data catalog (database, table, column definitions, descriptions, and additional prompts).

Common columns which have same meanings across tables will be store centralized to avoid data duplication.

Column attribute:

  • column_name: the name of the column

  • display_name: the display name of the column

  • type: the data type of the column

  • category: dimension or metric

  • description: the description of the column

  • is_common: is common column or not

abstractmethod get_data_warehouse_config() dict[source]

Get the data warehouse configuration

Returns:

Data warehouse configuration

Return type:

dict

abstractmethod get_sql_engine() Engine[source]

Get the SQLAlchemy engine for the catalog

Returns:

SQLAlchemy engine

Return type:

Engine

abstractmethod get_database_list() list[str][source]

Get a list of all databases

Returns:

List of database names

Return type:

List[str]

abstractmethod get_table_list(database: str | None = None) list[str][source]

Get a list of all tables in the specified database, if database is None, return all tables

Parameters:

database (Optional[str]) – Database name

Returns:

List of table names

Return type:

List[str]

abstractmethod get_column_list(table: str | None = None, database: str | None = None) list[dict[str, Any]][source]

Get all column information for the specified table, if table is None, return all common columns in the catalog

Parameters:
  • table (Optional[str]) – Table name

  • database (Optional[str]) – Database name

Returns:

List of column information, each column contains name, type, description, etc.

Return type:

List[Dict[str, Any]]

abstractmethod get_table_information(table: str, database: str | None = None) dict[str, Any][source]

Get the information for the specified table

Parameters:
  • table (str) – Table name

  • database (Optional[str]) – Database name

Returns:

Table information, including description text, selection rules, etc.

Return type:

Dict[str, Any]

abstractmethod get_sql_examples(table: str | None = None, database: str | None = None) list[tuple[str, str, list[str]]][source]

Get SQL examples

Parameters:
  • table (Optional[str]) – Table name

  • database (Optional[str]) – Database name

Returns:

List of SQL examples, each example is a Tuple-3 as (question, SQL, full_table_names)

Return type:

List[Tuple[str, str, List[str]]]

abstractmethod get_table_selection_examples() list[tuple[str, list[str]]][source]

Get table selection examples

Returns:

List of table selection examples, each example is a Tuple-2 as (question, selected tables)

Return type:

List[Tuple[str, List[str]]]

abstractmethod save_table_information(table: str, information: dict[str, Any], columns: list[dict[str, Any]], database: str | None = None, update_existing: bool = False) bool[source]

Save the information and columns for a table

Parameters:
  • table (str) – Table name

  • information (Dict[str, Any]) – Table information

  • columns (List[Dict[str, Any]]) – List of column information, each column dict contains at lease column_name, type, category, description

  • database (Optional[str]) – Database name

  • update_existing (bool) – Update existing table and column information

Returns:

Whether the save was successful

Return type:

bool

abstractmethod save_table_sql_examples(table: str, examples: list[dict[str, str]], database: str | None = None) bool[source]

Save SQL examples for a table

Parameters:
  • table (str) – Table name

  • examples (List[Dict[str, str]]) – List of SQL examples

  • database (Optional[str]) – Database name

Returns:

Whether the save was successful

Return type:

bool

abstractmethod save_table_selection_examples(examples: list[tuple[str, list[str]]]) bool[source]

Save table selection examples

Parameters:

examples (List[Tuple[str, List[str]]]) – List of table selection examples

Returns:

Whether the save was successful

Return type:

bool

abstractmethod check_exists() bool[source]

Check if the catalog store has existing data/content

Returns:

True if catalog store has existing data, False if empty or missing essential files

Return type:

bool

openchatbi.catalog.catalog_store.split_db_table_name(table: str, database: str | None = None) tuple[str, str, str][source]

Split full table name into db name and table name :param table: if database is None, should be full table name like db.table, otherwise should be only table name :type table: str :param database: Database name :type database: Optional[str]

Returns:

full_table_name, db_name, table_name

Return type:

Tuple[str, str, str]

Filesystem Implementation

File system-based catalog store implementation.

class openchatbi.catalog.store.file_system.FileSystemCatalogStore(data_path: str, data_warehouse_config: dict)[source]

Bases: CatalogStore

File system-based data catalog storage implementation.

Stores catalog data in CSV and YAML files on the local filesystem.

__init__(data_path: str, data_warehouse_config: dict)[source]

Initialize filesystem catalog store.

Parameters:
  • data_path (str) – Directory absolute path for storing catalog files.

  • data_warehouse_config (dict) – Data warehouse configuration dictionary with keys: - uri (str): Database connection URI - include_tables (Optional[List[str]]): List of tables to include, if None include all - database_name (Optional[str]): Database name to use in catalog

data_path: str
table_info_file: str
sql_example_file: str
table_selection_example_file: str
table_columns_file: str
common_columns_file: str
table_spec_columns_file: str
get_data_warehouse_config() dict[source]

Get the data warehouse configuration

Returns:

Data warehouse configuration

Return type:

dict

get_sql_engine() Engine[source]

Get the SQLAlchemy engine for the catalog

Returns:

SQLAlchemy engine

Return type:

Engine

get_database_list() list[str][source]

Get a list of all databases

Returns:

List of database names

Return type:

List[str]

get_table_list(database: str | None = None) list[str][source]

Get a list of all tables in the specified database, if database is None, return all tables

Parameters:

database (Optional[str]) – Database name

Returns:

List of table names

Return type:

List[str]

get_column_list(table: str | None = None, database: str | None = None) list[dict[str, Any]][source]

Get all column information for the specified table, if table is None, return all common columns in the catalog

Parameters:
  • table (Optional[str]) – Table name

  • database (Optional[str]) – Database name

Returns:

List of column information, each column contains name, type, description, etc.

Return type:

List[Dict[str, Any]]

get_table_information(table: str, database: str | None = None) dict[str, Any][source]

Get the information for the specified table

Parameters:
  • table (str) – Table name

  • database (Optional[str]) – Database name

Returns:

Table information, including description text, selection rules, etc.

Return type:

Dict[str, Any]

get_sql_examples(table: str | None = None, database: str | None = None) list[tuple[str, str, list[str]]][source]

Get SQL examples

Parameters:
  • table (Optional[str]) – Table name

  • database (Optional[str]) – Database name

Returns:

List of SQL examples, each example is a Tuple-3 as (question, SQL, full_table_names)

Return type:

List[Tuple[str, str, List[str]]]

get_table_selection_examples() list[tuple[str, list[str]]][source]

Get table selection examples

Returns:

List of table selection examples, each example is a Tuple-2 as (question, selected tables)

Return type:

List[Tuple[str, List[str]]]

save_table_information(table: str, information: dict[str, Any], columns: list[dict[str, Any]], database: str | None = None, update_existing: bool = False) bool[source]

Save the information and columns for a table

Parameters:
  • table (str) – Table name

  • information (Dict[str, Any]) – Table information

  • columns (List[Dict[str, Any]]) – List of column information, each column dict contains at lease column_name, type, category, description

  • database (Optional[str]) – Database name

  • update_existing (bool) – Update existing table and column information

Returns:

Whether the save was successful

Return type:

bool

save_table_sql_examples(table: str, examples: list[dict[str, str]], database: str | None = None) bool[source]

Save SQL examples for a table

Parameters:
  • table (str) – Table name

  • examples (List[Dict[str, str]]) – List of SQL examples

  • database (Optional[str]) – Database name

Returns:

Whether the save was successful

Return type:

bool

save_table_selection_examples(examples: list[tuple[str, list[str]]]) bool[source]

Save table selection examples

Parameters:

examples (List[Tuple[str, List[str]]]) – List of table selection examples

Returns:

Whether the save was successful

Return type:

bool

check_exists() bool[source]

Check if the catalog store has existing data/content

Returns:

True if catalog store has existing data, False if empty or missing essential files

Return type:

bool

Catalog Loader

class openchatbi.catalog.catalog_loader.DataCatalogLoader(engine: Engine, include_tables: list[str] | None = None)[source]

Bases: object

The loader to load data catalog from data warehouse metadata and save to catalog store.

__init__(engine: Engine, include_tables: list[str] | None = None)[source]

Initialize catalog loader.

Parameters:
  • engine (Engine) – SQLAlchemy engine instance

  • include_tables (Optional[List[str]]) – List of table names to include, None for all

get_tables_and_columns() dict[str, list[dict[str, Any]]][source]

Extract table and column metadata including comments using SQLAlchemy inspector.

Returns:

Dictionary mapping table names to list of column information

Return type:

Dict[str, List[Dict[str, Any]]]

get_table_indexes(table_name: str) list[dict[str, Any]][source]

Get index information for a specific table.

Parameters:

table_name (str) – Name of the table

Returns:

List of index information

Return type:

List[Dict[str, Any]]

get_foreign_keys(table_name: str) list[dict[str, Any]][source]

Get foreign key information for a specific table.

Parameters:

table_name (str) – Name of the table

Returns:

List of foreign key information

Return type:

List[Dict[str, Any]]

save_to_catalog_store(catalog_store: CatalogStore, database_name: str | None = None, update: bool = False) bool[source]

Extract warehouse metadata and save to catalog store.

Parameters:
  • catalog_store (CatalogStore) – Target catalog store to load data to

  • database_name (Optional[str]) – Database name in catalog, defaults to ‘default’

  • update (bool) – Update existing catalog store to sync with data warehouse

Returns:

True if load was successful, False otherwise

Return type:

bool

openchatbi.catalog.catalog_loader.load_catalog_from_data_warehouse(catalog_store: CatalogStore) bool[source]

Load catalog data from data warehouse using SQLAlchemy based on data warehouse config (URI)

Main entry point for catalog loading.

Parameters:

catalog_store (CatalogStore) – Target catalog store

Returns:

True if load was successful, False otherwise

Return type:

bool

Schema Retrieval