What · How · Why · Where · Importance
What
The semantic layer sits between raw data and business users/AI agents. It defines what data means — not just its format. A column called bal_amt becomes "the current available balance in INR as of last transaction."
How
Three tools: (1) Business Glossary — human-readable definitions. (2) dbt Semantic Models — SQL-defined metrics and dimensions. (3) Ontology — formal OWL/RDF class hierarchy with policy annotations.
Why
Without semantics, AI agents guess what columns mean and generate wrong Cypher/SQL. Regulators can't rely on reports where the same term means different things in different systems.
Where
Between every data system and its consumers: between the data catalog and analysts, between the knowledge graph and AI agents, between risk systems and regulatory reports.
Importance
The semantic layer is what makes AI trustworthy in banking. An LLM with rich semantic context can answer "What is total exposure?" correctly. Without it, it hallucinates column names and aggregation logic.
Business Glossary · dbt Metrics · Ontology
📚 Business Glossary
Controlled vocabulary managed in the data catalog (OpenMetadata). Each term has: definition, synonyms, related terms, owning domain, and links to columns that implement it. Eliminates the "what does NPA mean?" debate between Risk and Finance teams.
📊 dbt Semantic Model
SQL-defined entities, dimensions, and metrics that expose business concepts as queryable objects. portfolio_at_risk is defined once in dbt — not calculated differently by every team in every Excel file.
🤔 OWL Ontology
Formal class hierarchy using Web Ontology Language. BankAccount is the parent class; SavingsAccount and LoanAccount are subclasses. GDPR sensitivity and data retention annotations are machine-readable policy.
Banking Terms — Defined Precisely
Ambiguous terms that cause reporting errors when left undefined.
| Term | Common Confusion | Precise Definition |
|---|---|---|
| Balance | Live balance? EOD? Ledger? Available? | bal_amt in ACCT_MASTER = post-transaction ledger balance as of last system update |
| NPA | 30 days? 60 days? 90 days overdue? | RBI definition: overdue_days > 90 for term loans. Computed from LOAN_HDR.overdue_days |
| Portfolio at Risk | Outstanding balance? Overdue amount only? | Outstanding balance of ALL loans with ANY overdue payment / total outstanding portfolio × 100 |
| Total Exposure | Credit limit? Drawn amount? Including contingent? | Sum of all account balances + outstanding loan amounts for a customer — on-balance sheet only |
| Customer | Individual? Corporate? Joint? Guarantor? | Individual retail customer with KYC-verified identity. Identified by cust_id across all systems |
🤔 Why Ontologies Matter for AI & Compliance
- Machine-Readable PolicyOWL annotations like
gdpr:personalData trueon a class propagate to all instances — automation enforces masking without manual tagging every column. - InferenceA reasoner can infer that a
SavingsAccountIS aBankAccount, so all BankAccount policies automatically apply to savings accounts — no manual configuration. - AI GroundingWhen the LLM sees the ontology, it knows the class hierarchy and can generate correct Cypher that traverses the right node labels — not hallucinated ones.
- Regulatory ProofFormal ontology is an auditable artefact proving that your data model correctly encodes regulatory definitions — useful in BCBS 239 and Solvency II assessments.
-- mart/customer_360.sql SELECT a.cust_id, a.acct_no, a.bal_amt, a.acct_type, l.outstanding_amt, l.overdue_days, -- Semantic metric: total_exposure (defined in glossary) COALESCE(a.bal_amt, 0) + COALESCE(l.outstanding_amt, 0) AS total_exposure, -- Compliance flag: NPA per RBI definition CASE WHEN l.overdue_days > 90 THEN true ELSE false END AS overdue_flag, -- Risk bucketing CASE WHEN a.bal_amt < 10000 THEN 'low' WHEN a.bal_amt < 100000 THEN 'mid' ELSE 'high' END AS balance_category FROM staging.stg_acct_master a LEFT JOIN staging.stg_loan_hdr l ON a.cust_id = l.cust_id
dbt + ontology/banking_ontology.py
The demo defines the semantic layer in two places:
(1) dbt — staging models for all three databases, plus the customer_360 mart with total_exposure, overdue_flag, and balance_category.
(2) RDFLib ontology — BankAccount class hierarchy with GDPR annotations (contains_pii: true, gdpr_basis: legitimate_interest).
OpenMetadata links glossary terms back to dbt model columns, creating the full semantic chain from raw table → business concept.