Data Governance in Banking

The policies, roles, and processes that ensure data is accurate, consistent, secure, and used responsibly across the enterprise.

Overview

What · How · Why · Where · Importance

What

Data governance is the framework of rules, roles, processes, and standards that define who can do what with which data — and ensures data is fit for purpose.

How

Implemented through data catalogs (OpenMetadata, Collibra), business glossaries, data stewardship programs, access control policies (RBAC/ABAC), and lineage tracking.

Why

Without governance, banks have "data swamps" — nobody knows what data means, who owns it, or whether it's trustworthy. Regulators fine banks for poor data quality in risk reports.

🏠

Where

Every data-producing and -consuming system: core banking, DWH, reporting layers, risk systems, customer portals, and AI platforms.

Importance

BCBS 239 requires banks to aggregate risk data accurately. Poor governance directly causes regulatory breaches, wrong risk calculations, and loss of customer trust.

Core Concepts

The Building Blocks of Data Governance

Governance is not a single tool — it is a combination of people, process, and technology.

📄 Data Catalog

An inventory of all data assets — databases, tables, columns, pipelines — with technical metadata automatically discovered and business metadata enriched by stewards. Think of it as the "library catalogue" for all enterprise data.

📚 Business Glossary

A controlled vocabulary defining business terms precisely: What is a "Non-Performing Loan"? What does "Balance" mean in each system? Glossaries eliminate ambiguity between IT and business teams.

🔗 Data Lineage

A visual map showing where data originates, how it transforms, and where it lands. Crucial for impact analysis ("if I change this column, what reports break?") and regulatory audit trails.

👥 Data Stewardship

Assigned owners and stewards per data domain. The Data Owner is accountable; the Data Steward manages day-to-day quality. Without clear ownership, nobody fixes data problems.

🔒 Access Control

RBAC (Role-Based) and ABAC (Attribute-Based) policies that control who can read, modify, or export sensitive data fields (PAN, Aadhaar, account balances). Enforced at catalog and compute layer.

📊 Data Quality Management

Automated profiling, quality rules, and alerting. Banks use this to catch nulls in critical fields, flag duplicate customer IDs, and enforce referential integrity across systems.

⚠ Why Governance Fails — and What it Costs

Regulatory Context

BCBS 239: The Governance Standard for Banks

Basel Committee on Banking Supervision Principle 239 sets the bar for risk data aggregation and reporting.

Principle 1 — Governance

Board-level accountability for data quality in risk reporting. The CRO must sign off on data lineage from source to risk report.

Principle 2 — Data Architecture & IT Infrastructure

A single integrated data taxonomy across all risk systems. No siloed definitions of "exposure" or "counterparty."

Principles 3–6 — Risk Data Aggregation Capabilities

Accuracy, integrity, completeness, timeliness, and adaptability. Banks must be able to aggregate group-wide risk positions within hours, not days.

Principles 7–11 — Risk Reporting Practices

Accurate, clear, comprehensive reports that can be produced on-demand (stress scenario). Includes frequency of reporting and distribution policies.

🔧 In the Demo

OpenMetadata as the Governance Layer

The demo runs OpenMetadata (Docker) and ingests all three databases — PostgreSQL, MySQL, SQLite. It automatically discovers tables, columns, and row counts. Business glossary terms like portfolio_at_risk and overdue_loan are defined and linked to columns. Lineage is tracked from raw tables → dbt staging → customer_360 mart. Great Expectations plugs quality evidence directly back into the catalog.