Knowledge Graphs — Banking & Compliance Hub

Overview

What · How · Why · Where · Importance

❓

What

A knowledge graph is a network of entities (nodes) and their relationships (edges). In banking: Customers → Accounts → Loans → Branches, with relationships carrying transaction history, ownership stakes, and policy metadata.

⚙

How

Graph databases like Neo4j store and query this structure natively. The Cypher query language traverses relationships in milliseconds — no expensive JOINs across normalized tables.

✔

Why

Relational tables are terrible at "find all entities connected to this customer within 3 hops." That pattern is exactly what AML fraud detection and customer 360 require.

🏠

Where

AML network analysis, KYC entity resolution, customer 360 profiling, product recommendation, correspondent banking risk, and AI agent query backends.

⭐

Importance

Knowledge graphs are the substrate on which AI agents operate. They provide the structured, relationship-rich context that LLMs need to answer banking questions accurately without hallucinating.

Comparison

Graph vs Relational — When to Use Which

Query Type	Relational (SQL)	Graph (Cypher)
Customer total exposure	3-table JOIN — feasible	1-hop traversal — fast
Find all accounts at same branch	WHERE clause — easy	Pattern match — easy
Customers with both savings + overdue loan	Complex JOIN + subquery	Simple 2-hop pattern
AML: who shares a phone with a flagged customer	Self-JOIN across millions of rows	Direct edge traversal
Aggregated balance report	GROUP BY — optimal	Possible, but not native strength

Structure

Banking Knowledge Graph Schema

Nodes, properties, and relationships that capture the banking domain.

👥 Customer Node

Properties: cust_id, name, city, kyc_status, risk_category. The anchor entity — everything else connects through Customer.

💵 Account Nodes

SavingsAccount: acct_no, bal_amt, open_date, status.
CurrentAccount: same structure, different type label enabling type-specific policies.

📈 LoanAccount Node

Properties: loan_id, principal_amt, outstanding_amt, overdue_days, loan_type. overdue_days > 90 → NPA flag.

🏠 Branch Node

Properties: branch_code, city, region. Used for geographic concentration risk analysis — which branch has highest PAR.

→ OWNS Relationship

(Customer)-[:OWNS]->(SavingsAccount)
(Customer)-[:OWNS]->(LoanAccount)
The ownership edge carries since date for temporal queries.

→ HELD_AT / DISBURSED_AT

(Account)-[:HELD_AT]->(Branch)
(LoanAccount)-[:DISBURSED_AT]->(Branch)
Branch-level concentration analysis and geographic risk heatmaps.

🔍 Banking Use Cases for Knowledge Graphs

Customer 360Single-hop query aggregates all accounts, loans, and transactions for any customer. No JOIN across 10 tables.
AML Typology DetectionCircular funds flow, layering through multiple accounts — graph pattern matching catches what rule engines miss.
KYC Entity ResolutionShared phone number, address, or director links multiple "different" customers — graph deduplication finds them.
Portfolio Concentration Risk"Which branch has >30% of total loan exposure?" — traverse DISBURSED_AT edges, aggregate outstanding amounts.
Product RecommendationCustomers with savings account but no loan, similar profile to existing loan customers — graph-based collaborative filtering.
AI Agent BackendLLM generates Cypher, graph executes in milliseconds, agent returns answer with provenance (which nodes were traversed).

Example Cypher Queries

// Customer total financial exposure
MATCH (c:Customer {cust_id: 'CUST_0042'})-[:OWNS]->(a)
RETURN c.name,
       sum(a.bal_amt + coalesce(a.outstanding_amt, 0)) AS total_exposure

// Customers with savings AND overdue loan
MATCH (c:Customer)-[:OWNS]->(s:SavingsAccount),
      (c)-[:OWNS]->(l:LoanAccount)
WHERE l.overdue_days > 0
RETURN count(DISTINCT c) AS at_risk_customers

// Branch with highest portfolio at risk
MATCH (l:LoanAccount)-[:DISBURSED_AT]->(b:Branch)
WITH b, sum(l.outstanding_amt) AS total_outstanding,
        sum(CASE WHEN l.overdue_days > 0 THEN l.outstanding_amt ELSE 0 END) AS overdue_amt
RETURN b.branch_code, round(overdue_amt * 100.0 / total_outstanding, 2) AS par_pct
ORDER BY par_pct DESC LIMIT 5

🔧 In the Demo

Neo4j + `knowledge_graph/load_graph.py`

The demo loads all 1 000 customers, 1 000 accounts, 500 loans, and branches from PostgreSQL into Neo4j using load_graph.py. Relationships are created with MERGE for idempotency. The AI agent then uses Ollama to generate Cypher queries and executes them against this graph — answering natural language questions with provenance of which nodes were traversed.

← Data Contracts Semantic Layer →

Knowledge Graphs in Banking