What · How · Why · Where · Importance
What
A knowledge graph is a network of entities (nodes) and their relationships (edges). In banking: Customers → Accounts → Loans → Branches, with relationships carrying transaction history, ownership stakes, and policy metadata.
How
Graph databases like Neo4j store and query this structure natively. The Cypher query language traverses relationships in milliseconds — no expensive JOINs across normalized tables.
Why
Relational tables are terrible at "find all entities connected to this customer within 3 hops." That pattern is exactly what AML fraud detection and customer 360 require.
Where
AML network analysis, KYC entity resolution, customer 360 profiling, product recommendation, correspondent banking risk, and AI agent query backends.
Importance
Knowledge graphs are the substrate on which AI agents operate. They provide the structured, relationship-rich context that LLMs need to answer banking questions accurately without hallucinating.
Graph vs Relational — When to Use Which
| Query Type | Relational (SQL) | Graph (Cypher) |
|---|---|---|
| Customer total exposure | 3-table JOIN — feasible | 1-hop traversal — fast |
| Find all accounts at same branch | WHERE clause — easy | Pattern match — easy |
| Customers with both savings + overdue loan | Complex JOIN + subquery | Simple 2-hop pattern |
| AML: who shares a phone with a flagged customer | Self-JOIN across millions of rows | Direct edge traversal |
| Aggregated balance report | GROUP BY — optimal | Possible, but not native strength |
Banking Knowledge Graph Schema
Nodes, properties, and relationships that capture the banking domain.
👥 Customer Node
Properties: cust_id, name, city, kyc_status, risk_category. The anchor entity — everything else connects through Customer.
💵 Account Nodes
SavingsAccount: acct_no, bal_amt, open_date, status.
CurrentAccount: same structure, different type label enabling type-specific policies.
📈 LoanAccount Node
Properties: loan_id, principal_amt, outstanding_amt, overdue_days, loan_type. overdue_days > 90 → NPA flag.
🏠 Branch Node
Properties: branch_code, city, region. Used for geographic concentration risk analysis — which branch has highest PAR.
→ OWNS Relationship
(Customer)-[:OWNS]->(SavingsAccount)
(Customer)-[:OWNS]->(LoanAccount)
The ownership edge carries since date for temporal queries.
→ HELD_AT / DISBURSED_AT
(Account)-[:HELD_AT]->(Branch)
(LoanAccount)-[:DISBURSED_AT]->(Branch)
Branch-level concentration analysis and geographic risk heatmaps.
🔍 Banking Use Cases for Knowledge Graphs
- Customer 360Single-hop query aggregates all accounts, loans, and transactions for any customer. No JOIN across 10 tables.
- AML Typology DetectionCircular funds flow, layering through multiple accounts — graph pattern matching catches what rule engines miss.
- KYC Entity ResolutionShared phone number, address, or director links multiple "different" customers — graph deduplication finds them.
- Portfolio Concentration Risk"Which branch has >30% of total loan exposure?" — traverse DISBURSED_AT edges, aggregate outstanding amounts.
- Product RecommendationCustomers with savings account but no loan, similar profile to existing loan customers — graph-based collaborative filtering.
- AI Agent BackendLLM generates Cypher, graph executes in milliseconds, agent returns answer with provenance (which nodes were traversed).
// Customer total financial exposure MATCH (c:Customer {cust_id: 'CUST_0042'})-[:OWNS]->(a) RETURN c.name, sum(a.bal_amt + coalesce(a.outstanding_amt, 0)) AS total_exposure // Customers with savings AND overdue loan MATCH (c:Customer)-[:OWNS]->(s:SavingsAccount), (c)-[:OWNS]->(l:LoanAccount) WHERE l.overdue_days > 0 RETURN count(DISTINCT c) AS at_risk_customers // Branch with highest portfolio at risk MATCH (l:LoanAccount)-[:DISBURSED_AT]->(b:Branch) WITH b, sum(l.outstanding_amt) AS total_outstanding, sum(CASE WHEN l.overdue_days > 0 THEN l.outstanding_amt ELSE 0 END) AS overdue_amt RETURN b.branch_code, round(overdue_amt * 100.0 / total_outstanding, 2) AS par_pct ORDER BY par_pct DESC LIMIT 5
Neo4j + knowledge_graph/load_graph.py
The demo loads all 1 000 customers, 1 000 accounts, 500 loans, and branches from PostgreSQL into Neo4j
using load_graph.py. Relationships are created with MERGE for idempotency.
The AI agent then uses Ollama to generate Cypher queries and executes them against this graph —
answering natural language questions with provenance of which nodes were traversed.