Catalog System
Overview
The catalog system manages metadata for database tables, columns, and business rules.
Catalog Store
- class openchatbi.catalog.catalog_store.CatalogStore[source]
Bases:
ABCAbstract base class defining the storage interface for data catalog (database, table, column definitions, descriptions, and additional prompts).
Common columns which have same meanings across tables will be store centralized to avoid data duplication.
Column attribute:
column_name: the name of the column
display_name: the display name of the column
type: the data type of the column
category: dimension or metric
description: the description of the column
is_common: is common column or not
- abstractmethod get_data_warehouse_config() dict[source]
Get the data warehouse configuration
- Returns:
Data warehouse configuration
- Return type:
dict
- abstractmethod get_sql_engine() Engine[source]
Get the SQLAlchemy engine for the catalog
- Returns:
SQLAlchemy engine
- Return type:
Engine
- abstractmethod get_database_list() list[str][source]
Get a list of all databases
- Returns:
List of database names
- Return type:
List[str]
- abstractmethod get_table_list(database: str | None = None) list[str][source]
Get a list of all tables in the specified database, if database is None, return all tables
- Parameters:
database (Optional[str]) – Database name
- Returns:
List of table names
- Return type:
List[str]
- abstractmethod get_column_list(table: str | None = None, database: str | None = None) list[dict[str, Any]][source]
Get all column information for the specified table, if table is None, return all common columns in the catalog
- Parameters:
table (Optional[str]) – Table name
database (Optional[str]) – Database name
- Returns:
List of column information, each column contains name, type, description, etc.
- Return type:
List[Dict[str, Any]]
- abstractmethod get_table_information(table: str, database: str | None = None) dict[str, Any][source]
Get the information for the specified table
- Parameters:
table (str) – Table name
database (Optional[str]) – Database name
- Returns:
Table information, including description text, selection rules, etc.
- Return type:
Dict[str, Any]
- abstractmethod get_sql_examples(table: str | None = None, database: str | None = None) list[tuple[str, str, list[str]]][source]
Get SQL examples
- Parameters:
table (Optional[str]) – Table name
database (Optional[str]) – Database name
- Returns:
List of SQL examples, each example is a Tuple-3 as (question, SQL, full_table_names)
- Return type:
List[Tuple[str, str, List[str]]]
- abstractmethod get_table_selection_examples() list[tuple[str, list[str]]][source]
Get table selection examples
- Returns:
List of table selection examples, each example is a Tuple-2 as (question, selected tables)
- Return type:
List[Tuple[str, List[str]]]
- abstractmethod save_table_information(table: str, information: dict[str, Any], columns: list[dict[str, Any]], database: str | None = None, update_existing: bool = False) bool[source]
Save the information and columns for a table
- Parameters:
table (str) – Table name
information (Dict[str, Any]) – Table information
columns (List[Dict[str, Any]]) – List of column information, each column dict contains at lease column_name, type, category, description
database (Optional[str]) – Database name
update_existing (bool) – Update existing table and column information
- Returns:
Whether the save was successful
- Return type:
bool
- abstractmethod save_table_sql_examples(table: str, examples: list[dict[str, str]], database: str | None = None) bool[source]
Save SQL examples for a table
- Parameters:
table (str) – Table name
examples (List[Dict[str, str]]) – List of SQL examples
database (Optional[str]) – Database name
- Returns:
Whether the save was successful
- Return type:
bool
- openchatbi.catalog.catalog_store.split_db_table_name(table: str, database: str | None = None) tuple[str, str, str][source]
Split full table name into db name and table name :param table: if database is None, should be full table name like db.table, otherwise should be only table name :type table: str :param database: Database name :type database: Optional[str]
- Returns:
full_table_name, db_name, table_name
- Return type:
Tuple[str, str, str]
Filesystem Implementation
File system-based catalog store implementation.
- class openchatbi.catalog.store.file_system.FileSystemCatalogStore(data_path: str, data_warehouse_config: dict)[source]
Bases:
CatalogStoreFile system-based data catalog storage implementation.
Stores catalog data in CSV and YAML files on the local filesystem.
- __init__(data_path: str, data_warehouse_config: dict)[source]
Initialize filesystem catalog store.
- Parameters:
data_path (str) – Directory absolute path for storing catalog files.
data_warehouse_config (dict) – Data warehouse configuration dictionary with keys: - uri (str): Database connection URI - include_tables (Optional[List[str]]): List of tables to include, if None include all - database_name (Optional[str]): Database name to use in catalog
- data_path: str
- table_info_file: str
- sql_example_file: str
- table_selection_example_file: str
- table_columns_file: str
- common_columns_file: str
- table_spec_columns_file: str
- get_data_warehouse_config() dict[source]
Get the data warehouse configuration
- Returns:
Data warehouse configuration
- Return type:
dict
- get_sql_engine() Engine[source]
Get the SQLAlchemy engine for the catalog
- Returns:
SQLAlchemy engine
- Return type:
Engine
- get_database_list() list[str][source]
Get a list of all databases
- Returns:
List of database names
- Return type:
List[str]
- get_table_list(database: str | None = None) list[str][source]
Get a list of all tables in the specified database, if database is None, return all tables
- Parameters:
database (Optional[str]) – Database name
- Returns:
List of table names
- Return type:
List[str]
- get_column_list(table: str | None = None, database: str | None = None) list[dict[str, Any]][source]
Get all column information for the specified table, if table is None, return all common columns in the catalog
- Parameters:
table (Optional[str]) – Table name
database (Optional[str]) – Database name
- Returns:
List of column information, each column contains name, type, description, etc.
- Return type:
List[Dict[str, Any]]
- get_table_information(table: str, database: str | None = None) dict[str, Any][source]
Get the information for the specified table
- Parameters:
table (str) – Table name
database (Optional[str]) – Database name
- Returns:
Table information, including description text, selection rules, etc.
- Return type:
Dict[str, Any]
- get_sql_examples(table: str | None = None, database: str | None = None) list[tuple[str, str, list[str]]][source]
Get SQL examples
- Parameters:
table (Optional[str]) – Table name
database (Optional[str]) – Database name
- Returns:
List of SQL examples, each example is a Tuple-3 as (question, SQL, full_table_names)
- Return type:
List[Tuple[str, str, List[str]]]
- get_table_selection_examples() list[tuple[str, list[str]]][source]
Get table selection examples
- Returns:
List of table selection examples, each example is a Tuple-2 as (question, selected tables)
- Return type:
List[Tuple[str, List[str]]]
- save_table_information(table: str, information: dict[str, Any], columns: list[dict[str, Any]], database: str | None = None, update_existing: bool = False) bool[source]
Save the information and columns for a table
- Parameters:
table (str) – Table name
information (Dict[str, Any]) – Table information
columns (List[Dict[str, Any]]) – List of column information, each column dict contains at lease column_name, type, category, description
database (Optional[str]) – Database name
update_existing (bool) – Update existing table and column information
- Returns:
Whether the save was successful
- Return type:
bool
- save_table_sql_examples(table: str, examples: list[dict[str, str]], database: str | None = None) bool[source]
Save SQL examples for a table
- Parameters:
table (str) – Table name
examples (List[Dict[str, str]]) – List of SQL examples
database (Optional[str]) – Database name
- Returns:
Whether the save was successful
- Return type:
bool
Catalog Loader
- class openchatbi.catalog.catalog_loader.DataCatalogLoader(engine: Engine, include_tables: list[str] | None = None)[source]
Bases:
objectThe loader to load data catalog from data warehouse metadata and save to catalog store.
- __init__(engine: Engine, include_tables: list[str] | None = None)[source]
Initialize catalog loader.
- Parameters:
engine (Engine) – SQLAlchemy engine instance
include_tables (Optional[List[str]]) – List of table names to include, None for all
- get_tables_and_columns() dict[str, list[dict[str, Any]]][source]
Extract table and column metadata including comments using SQLAlchemy inspector.
- Returns:
Dictionary mapping table names to list of column information
- Return type:
Dict[str, List[Dict[str, Any]]]
- get_table_indexes(table_name: str) list[dict[str, Any]][source]
Get index information for a specific table.
- Parameters:
table_name (str) – Name of the table
- Returns:
List of index information
- Return type:
List[Dict[str, Any]]
- get_foreign_keys(table_name: str) list[dict[str, Any]][source]
Get foreign key information for a specific table.
- Parameters:
table_name (str) – Name of the table
- Returns:
List of foreign key information
- Return type:
List[Dict[str, Any]]
- save_to_catalog_store(catalog_store: CatalogStore, database_name: str | None = None, update: bool = False) bool[source]
Extract warehouse metadata and save to catalog store.
- Parameters:
catalog_store (CatalogStore) – Target catalog store to load data to
database_name (Optional[str]) – Database name in catalog, defaults to ‘default’
update (bool) – Update existing catalog store to sync with data warehouse
- Returns:
True if load was successful, False otherwise
- Return type:
bool
- openchatbi.catalog.catalog_loader.load_catalog_from_data_warehouse(catalog_store: CatalogStore) bool[source]
Load catalog data from data warehouse using SQLAlchemy based on data warehouse config (URI)
Main entry point for catalog loading.
- Parameters:
catalog_store (CatalogStore) – Target catalog store
- Returns:
True if load was successful, False otherwise
- Return type:
bool