Text2SQL System

Overview

Natural language to SQL conversion pipeline with schema linking and prompt engineering.

SQL Graph

SQL generation graph construction and execution.

openchatbi.text2sql.sql_graph.ask_human(state)[source]

Node function to ask human for additional information or clarification.

Parameters:

state (SQLGraphState) – The current SQL graph state containing messages and context.

Returns:

Updated state with human feedback as a tool message and user input.

Return type:

dict

openchatbi.text2sql.sql_graph.route_after_confidence(state: SQLGraphState) str[source]

Route after the confidence gate based on the human decision.

approve -> visualization; reject -> regenerate; edit -> re-execute the user-edited SQL. Defaults to visualization when no decision is present (gate disabled or score above threshold).

openchatbi.text2sql.sql_graph.build_sql_graph(catalog: CatalogStore, checkpointer: None | bool | BaseCheckpointSaver, memory_store: BaseStore, llm_provider: str | None = None) CompiledStateGraph[source]

Build SQL generation graph with all nodes and edges.

Parameters:
  • catalog – Catalog store containing schema information.

  • checkpointer – The Checkpointer for state persistence (short memory). If None, no short memory.

  • memory_store – The BaseStore to use for long-term memory. If None, no long-term memory.

Returns:

Compiled SQL graph ready for execution.

Return type:

CompiledStateGraph

SQL Generation

openchatbi.text2sql.generate_sql.create_sql_nodes(llm: BaseChatModel, catalog: CatalogStore, dialect: str, visualization_mode: str | None = 'rule', learned_sql_store: Any | None = None) tuple[Callable, Callable, Callable, Callable, Callable, Callable][source]

Creates the SQL processing nodes for LangGraph.

Parameters:
  • llm (BaseChatModel) – The language model to use for SQL generation.

  • catalog (CatalogStore) – The catalog store containing schema information.

  • dialect (str) – The SQL dialect to use (e.g., ‘presto’, ‘mysql’).

  • visualization_mode (str | None) – Visualization analysis mode (“rule”, “llm”, or None to skip).

  • learned_sql_store – Optional LearnedSQLStore handle for blended retrieval and gated auto-capture. When None, falls back to the legacy static retriever and capture is a no-op (default-off, zero regression).

Returns:

Six node functions (generate, execute, regenerate, visualization, score_sql, confidence_gate)

Return type:

tuple

openchatbi.text2sql.generate_sql.should_execute_sql(state: SQLGraphState) str[source]

Conditional edge function to determine if SQL should be executed.

Parameters:

state (SQLGraphState) – Current state

Returns:

Next node name - “execute_sql” if SQL is generated, “end” if done

Return type:

str

Schema Linking

Schema linking module for table and column selection in text2sql.

openchatbi.text2sql.schema_linking.schema_linking(llm: BaseChatModel, catalog: CatalogStore)[source]

Create function for schema linking: select appropriate tables and columns for a question.

Parameters:
  • llm (BaseChatModel) – Language model for table selection.

  • catalog (CatalogStore) – Catalog store with schema information.

Returns:

Node function for schema linking based on question.

Return type:

function

Information Extraction

Information extraction module for text2sql processing.

openchatbi.text2sql.extraction.information_extraction(llm: BaseChatModel) Callable[source]

Create function to extract information from questions.

Parameters:

llm (BaseChatModel) – Language model for information extraction.

Returns:

Node function that extracts information from questions.

Return type:

function

openchatbi.text2sql.extraction.information_extraction_conditional_edges(state: SQLGraphState)[source]

Determine next node after information extraction.

Parameters:

state (SQLGraphState) – Current SQL graph state.

Returns:

Next node (‘ask_human’, ‘search_knowledge’, ‘next’, or ‘end’).

Return type:

str

Text2SQL Utilities

Utility functions for text2sql retrieval systems.

class openchatbi.text2sql.text2sql_utils.LearnedSQLStore(vector_db, example_dict: dict, lock: threading.Lock | None = None)[source]

Bases: object

Runtime-mutable learned SQL knowledge base.

Wraps the text2sql vector store so that approved (source='golden') and auto-captured (source='auto') examples can be written at runtime and retrieved alongside the static catalog examples. Writes are guarded by a lock because SimpleStore.add_texts rebuilds the BM25 index O(N) and is not threadsafe; callers are responsible for the durable YAML half of the dual-write contract (the in-memory add_texts here is the volatile half).

__init__(vector_db, example_dict: dict, lock: threading.Lock | None = None)[source]
add(question: str, sql: str, tables: list[str], *, source: str, importance: float = 1.0, namespace: str = 'global') None[source]

Add a learned example to the runtime store (volatile half of dual-write).

Parameters:
  • question – Natural-language question (the indexed text).

  • sql – SQL answer.

  • tables – Tables used by the SQL.

  • source – Provenance; ‘golden’ (human-approved) or ‘auto’ (S2-gated capture).

  • importance – Base importance weight used by composite scoring.

  • namespace – Tenant/scope tag; ‘global’ must hold only schema-level patterns.

add_golden_sql(question: str, sql: str, tables: list[str]) None[source]

Alias: add a human-approved golden example with high importance.

retrieve(question: str, k: int = 10, score_fn=None) list[tuple[str, str, list[str]]][source]

Retrieve top-k learned examples for a question.

Parameters:
  • question – Query text.

  • k – Number of examples to return.

  • score_fn – Optional re-ranker (metadata, base_rank) -> float (e.g. composite_score from memory_scoring); higher is better. When None, the underlying MMR order is preserved.

Returns:

List of (question, sql, tables) tuples.