Semantic Layer & Ontologies

Translating raw technical data into shared business meaning — through business glossaries, dbt semantic models, and formal OWL ontologies.

Overview

What · How · Why · Where · Importance

What

The semantic layer sits between raw data and business users/AI agents. It defines what data means — not just its format. A column called bal_amt becomes "the current available balance in INR as of last transaction."

How

Three tools: (1) Business Glossary — human-readable definitions. (2) dbt Semantic Models — SQL-defined metrics and dimensions. (3) Ontology — formal OWL/RDF class hierarchy with policy annotations.

Why

Without semantics, AI agents guess what columns mean and generate wrong Cypher/SQL. Regulators can't rely on reports where the same term means different things in different systems.

🏠

Where

Between every data system and its consumers: between the data catalog and analysts, between the knowledge graph and AI agents, between risk systems and regulatory reports.

Importance

The semantic layer is what makes AI trustworthy in banking. An LLM with rich semantic context can answer "What is total exposure?" correctly. Without it, it hallucinates column names and aggregation logic.

Three Pillars

Business Glossary · dbt Metrics · Ontology

📚 Business Glossary

Controlled vocabulary managed in the data catalog (OpenMetadata). Each term has: definition, synonyms, related terms, owning domain, and links to columns that implement it. Eliminates the "what does NPA mean?" debate between Risk and Finance teams.

📊 dbt Semantic Model

SQL-defined entities, dimensions, and metrics that expose business concepts as queryable objects. portfolio_at_risk is defined once in dbt — not calculated differently by every team in every Excel file.

🤔 OWL Ontology

Formal class hierarchy using Web Ontology Language. BankAccount is the parent class; SavingsAccount and LoanAccount are subclasses. GDPR sensitivity and data retention annotations are machine-readable policy.

Glossary in Action

Banking Terms — Defined Precisely

Ambiguous terms that cause reporting errors when left undefined.

Term Common Confusion Precise Definition
Balance Live balance? EOD? Ledger? Available? bal_amt in ACCT_MASTER = post-transaction ledger balance as of last system update
NPA 30 days? 60 days? 90 days overdue? RBI definition: overdue_days > 90 for term loans. Computed from LOAN_HDR.overdue_days
Portfolio at Risk Outstanding balance? Overdue amount only? Outstanding balance of ALL loans with ANY overdue payment / total outstanding portfolio × 100
Total Exposure Credit limit? Drawn amount? Including contingent? Sum of all account balances + outstanding loan amounts for a customer — on-balance sheet only
Customer Individual? Corporate? Joint? Guarantor? Individual retail customer with KYC-verified identity. Identified by cust_id across all systems

🤔 Why Ontologies Matter for AI & Compliance

dbt Semantic Model — customer_360
-- mart/customer_360.sql
SELECT
    a.cust_id,
    a.acct_no,
    a.bal_amt,
    a.acct_type,
    l.outstanding_amt,
    l.overdue_days,
    -- Semantic metric: total_exposure (defined in glossary)
    COALESCE(a.bal_amt, 0) + COALESCE(l.outstanding_amt, 0) AS total_exposure,
    -- Compliance flag: NPA per RBI definition
    CASE WHEN l.overdue_days > 90 THEN true ELSE false END AS overdue_flag,
    -- Risk bucketing
    CASE
      WHEN a.bal_amt < 10000 THEN 'low'
      WHEN a.bal_amt < 100000 THEN 'mid'
      ELSE 'high'
    END AS balance_category
FROM staging.stg_acct_master a
LEFT JOIN staging.stg_loan_hdr l ON a.cust_id = l.cust_id
🔧 In the Demo

dbt + ontology/banking_ontology.py

The demo defines the semantic layer in two places: (1) dbt — staging models for all three databases, plus the customer_360 mart with total_exposure, overdue_flag, and balance_category. (2) RDFLib ontologyBankAccount class hierarchy with GDPR annotations (contains_pii: true, gdpr_basis: legitimate_interest). OpenMetadata links glossary terms back to dbt model columns, creating the full semantic chain from raw table → business concept.