From Legacy Chaos to Governed Intelligence
A bank holds petabytes of customer data spread across mainframes, EDWs, and reporting databases — yet cannot answer the question "What is a customer's total financial exposure?" in real time. This hub explains every concept, pattern, and technology needed to go from that chaos to a clean, auditable, AI-queryable knowledge layer — then shows you how the demo implements each piece.
Choose a Topic
Each page covers What · How · Why · Where · Importance for one pillar of modern banking data architecture.
Data Governance
Ownership, stewardship, policies, and cataloging — the backbone that makes all other layers trustworthy.
Banking Compliance
Basel III, GDPR, AML, KYC — what regulators require and how data architecture enables auditability.
Data Contracts
Schema agreements, SLA commitments, and quality rules that make producer–consumer trust explicit.
Knowledge Graphs
Graph databases connecting customers, accounts, loans, and branches — enabling relationship-based queries.
Semantic Layer & Ontologies
Business glossaries, ontologies, and semantic models that translate technical data into business meaning.
Context Engineering
Curating and serving rich context to AI agents — the discipline that determines how well AI understands your data.
AI Agents in Banking
Natural language to Cypher, RAG over knowledge graphs, and agentic workflows for banking queries.
Demo Implementation
How every concept maps to a running local demo — Docker, dbt, Neo4j, Ollama, Great Expectations, and more.
The Full Stack, Layer by Layer
Follow data from ingestion through to AI consumption.
🖼 Ingestion — Legacy Estate
PostgreSQL (core banking / Finacle), MySQL (DW EOD snapshot), SQLite (MIS reporting). Raw, inconsistent, siloed.
🔍 Discovery — Metadata Catalog
OpenMetadata scans all three databases: tables, columns, row counts, lineage. Produces a unified technical catalog.
✓ Quality — Profiling & Contracts
Great Expectations profiles data (nulls, distributions, anomalies). YAML contracts enforce schema, freshness SLA, and quality rules.
📊 Semantics — dbt Semantic Models
dbt staging + mart models. customer_360 mart joins accounts and loans; computes portfolio_at_risk and total_exposure.
🔁 Knowledge Graph — Neo4j
1 000 customers, 1 000 accounts, 500 loans, and branches loaded as nodes. Relationships: OWNS, HELD_AT, DISBURSED_AT.
📜 Ontology — RDFLib
OWL class hierarchy (BankAccount → SavingsAccount / LoanAccount) with GDPR policy annotations and data sensitivity tags.
🤖 AI Agent — Ollama + Cypher
Natural language question → Ollama generates Cypher → Neo4j executes → answer with provenance returned to the user.
Ready to run it locally?
Go to the Implementation page for exact commands:
docker compose up → python generate_data.py → dbt run → python load_graph.py → python agent.py.
The full walkthrough maps each step to its concept page.