Text2SQL System
Overview
Natural language to SQL conversion pipeline with schema linking and prompt engineering.
SQL Graph
SQL generation graph construction and execution.
- openchatbi.text2sql.sql_graph.ask_human(state)[source]
Node function to ask human for additional information or clarification.
- Parameters:
state (SQLGraphState) – The current SQL graph state containing messages and context.
- Returns:
Updated state with human feedback as a tool message and user input.
- Return type:
dict
- openchatbi.text2sql.sql_graph.route_after_confidence(state: SQLGraphState) str[source]
Route after the confidence gate based on the human decision.
approve -> visualization; reject -> regenerate; edit -> re-execute the user-edited SQL. Defaults to visualization when no decision is present (gate disabled or score above threshold).
- openchatbi.text2sql.sql_graph.build_sql_graph(catalog: CatalogStore, checkpointer: None | bool | BaseCheckpointSaver, memory_store: BaseStore, llm_provider: str | None = None) CompiledStateGraph[source]
Build SQL generation graph with all nodes and edges.
- Parameters:
catalog – Catalog store containing schema information.
checkpointer – The Checkpointer for state persistence (short memory). If None, no short memory.
memory_store – The BaseStore to use for long-term memory. If None, no long-term memory.
- Returns:
Compiled SQL graph ready for execution.
- Return type:
CompiledStateGraph
SQL Generation
- openchatbi.text2sql.generate_sql.create_sql_nodes(llm: BaseChatModel, catalog: CatalogStore, dialect: str, visualization_mode: str | None = 'rule', learned_sql_store: Any | None = None) tuple[Callable, Callable, Callable, Callable, Callable, Callable][source]
Creates the SQL processing nodes for LangGraph.
- Parameters:
llm (BaseChatModel) – The language model to use for SQL generation.
catalog (CatalogStore) – The catalog store containing schema information.
dialect (str) – The SQL dialect to use (e.g., ‘presto’, ‘mysql’).
visualization_mode (str | None) – Visualization analysis mode (“rule”, “llm”, or None to skip).
learned_sql_store – Optional LearnedSQLStore handle for blended retrieval and gated auto-capture. When None, falls back to the legacy static retriever and capture is a no-op (default-off, zero regression).
- Returns:
Six node functions (generate, execute, regenerate, visualization, score_sql, confidence_gate)
- Return type:
tuple
- openchatbi.text2sql.generate_sql.should_execute_sql(state: SQLGraphState) str[source]
Conditional edge function to determine if SQL should be executed.
- Parameters:
state (SQLGraphState) – Current state
- Returns:
Next node name - “execute_sql” if SQL is generated, “end” if done
- Return type:
str
Schema Linking
Schema linking module for table and column selection in text2sql.
- openchatbi.text2sql.schema_linking.schema_linking(llm: BaseChatModel, catalog: CatalogStore)[source]
Create function for schema linking: select appropriate tables and columns for a question.
- Parameters:
llm (BaseChatModel) – Language model for table selection.
catalog (CatalogStore) – Catalog store with schema information.
- Returns:
Node function for schema linking based on question.
- Return type:
function
Information Extraction
Information extraction module for text2sql processing.
- openchatbi.text2sql.extraction.information_extraction(llm: BaseChatModel) Callable[source]
Create function to extract information from questions.
- Parameters:
llm (BaseChatModel) – Language model for information extraction.
- Returns:
Node function that extracts information from questions.
- Return type:
function
- openchatbi.text2sql.extraction.information_extraction_conditional_edges(state: SQLGraphState)[source]
Determine next node after information extraction.
- Parameters:
state (SQLGraphState) – Current SQL graph state.
- Returns:
Next node (‘ask_human’, ‘search_knowledge’, ‘next’, or ‘end’).
- Return type:
str
Text2SQL Utilities
Utility functions for text2sql retrieval systems.
- class openchatbi.text2sql.text2sql_utils.LearnedSQLStore(vector_db, example_dict: dict, lock: threading.Lock | None = None)[source]
Bases:
objectRuntime-mutable learned SQL knowledge base.
Wraps the text2sql vector store so that approved (
source='golden') and auto-captured (source='auto') examples can be written at runtime and retrieved alongside the static catalog examples. Writes are guarded by a lock becauseSimpleStore.add_textsrebuilds the BM25 index O(N) and is not threadsafe; callers are responsible for the durable YAML half of the dual-write contract (the in-memoryadd_textshere is the volatile half).- add(question: str, sql: str, tables: list[str], *, source: str, importance: float = 1.0, namespace: str = 'global') None[source]
Add a learned example to the runtime store (volatile half of dual-write).
- Parameters:
question – Natural-language question (the indexed text).
sql – SQL answer.
tables – Tables used by the SQL.
source – Provenance; ‘golden’ (human-approved) or ‘auto’ (S2-gated capture).
importance – Base importance weight used by composite scoring.
namespace – Tenant/scope tag; ‘global’ must hold only schema-level patterns.
- add_golden_sql(question: str, sql: str, tables: list[str]) None[source]
Alias: add a human-approved golden example with high importance.
- retrieve(question: str, k: int = 10, score_fn=None) list[tuple[str, str, list[str]]][source]
Retrieve top-k learned examples for a question.
- Parameters:
question – Query text.
k – Number of examples to return.
score_fn – Optional re-ranker
(metadata, base_rank) -> float(e.g. composite_score from memory_scoring); higher is better. When None, the underlying MMR order is preserved.
- Returns:
List of (question, sql, tables) tuples.