Knowledge Graphs in Banking

Graph databases that model customers, accounts, loans, and branches as nodes and relationships — enabling relationship-based queries that are impossible in relational tables.

Overview

What · How · Why · Where · Importance

What

A knowledge graph is a network of entities (nodes) and their relationships (edges). In banking: Customers → Accounts → Loans → Branches, with relationships carrying transaction history, ownership stakes, and policy metadata.

How

Graph databases like Neo4j store and query this structure natively. The Cypher query language traverses relationships in milliseconds — no expensive JOINs across normalized tables.

Why

Relational tables are terrible at "find all entities connected to this customer within 3 hops." That pattern is exactly what AML fraud detection and customer 360 require.

🏠

Where

AML network analysis, KYC entity resolution, customer 360 profiling, product recommendation, correspondent banking risk, and AI agent query backends.

Importance

Knowledge graphs are the substrate on which AI agents operate. They provide the structured, relationship-rich context that LLMs need to answer banking questions accurately without hallucinating.

Comparison

Graph vs Relational — When to Use Which

Query Type Relational (SQL) Graph (Cypher)
Customer total exposure 3-table JOIN — feasible 1-hop traversal — fast
Find all accounts at same branch WHERE clause — easy Pattern match — easy
Customers with both savings + overdue loan Complex JOIN + subquery Simple 2-hop pattern
AML: who shares a phone with a flagged customer Self-JOIN across millions of rows Direct edge traversal
Aggregated balance report GROUP BY — optimal Possible, but not native strength
Structure

Banking Knowledge Graph Schema

Nodes, properties, and relationships that capture the banking domain.

👥 Customer Node

Properties: cust_id, name, city, kyc_status, risk_category. The anchor entity — everything else connects through Customer.

💵 Account Nodes

SavingsAccount: acct_no, bal_amt, open_date, status.
CurrentAccount: same structure, different type label enabling type-specific policies.

📈 LoanAccount Node

Properties: loan_id, principal_amt, outstanding_amt, overdue_days, loan_type. overdue_days > 90 → NPA flag.

🏠 Branch Node

Properties: branch_code, city, region. Used for geographic concentration risk analysis — which branch has highest PAR.

→ OWNS Relationship

(Customer)-[:OWNS]->(SavingsAccount)
(Customer)-[:OWNS]->(LoanAccount)
The ownership edge carries since date for temporal queries.

→ HELD_AT / DISBURSED_AT

(Account)-[:HELD_AT]->(Branch)
(LoanAccount)-[:DISBURSED_AT]->(Branch)
Branch-level concentration analysis and geographic risk heatmaps.

🔍 Banking Use Cases for Knowledge Graphs

Example Cypher Queries
// Customer total financial exposure
MATCH (c:Customer {cust_id: 'CUST_0042'})-[:OWNS]->(a)
RETURN c.name,
       sum(a.bal_amt + coalesce(a.outstanding_amt, 0)) AS total_exposure

// Customers with savings AND overdue loan
MATCH (c:Customer)-[:OWNS]->(s:SavingsAccount),
      (c)-[:OWNS]->(l:LoanAccount)
WHERE l.overdue_days > 0
RETURN count(DISTINCT c) AS at_risk_customers

// Branch with highest portfolio at risk
MATCH (l:LoanAccount)-[:DISBURSED_AT]->(b:Branch)
WITH b, sum(l.outstanding_amt) AS total_outstanding,
        sum(CASE WHEN l.overdue_days > 0 THEN l.outstanding_amt ELSE 0 END) AS overdue_amt
RETURN b.branch_code, round(overdue_amt * 100.0 / total_outstanding, 2) AS par_pct
ORDER BY par_pct DESC LIMIT 5
🔧 In the Demo

Neo4j + knowledge_graph/load_graph.py

The demo loads all 1 000 customers, 1 000 accounts, 500 loans, and branches from PostgreSQL into Neo4j using load_graph.py. Relationships are created with MERGE for idempotency. The AI agent then uses Ollama to generate Cypher queries and executes them against this graph — answering natural language questions with provenance of which nodes were traversed.