What · How · Why · Where · Importance
What
Data governance is the framework of rules, roles, processes, and standards that define who can do what with which data — and ensures data is fit for purpose.
How
Implemented through data catalogs (OpenMetadata, Collibra), business glossaries, data stewardship programs, access control policies (RBAC/ABAC), and lineage tracking.
Why
Without governance, banks have "data swamps" — nobody knows what data means, who owns it, or whether it's trustworthy. Regulators fine banks for poor data quality in risk reports.
Where
Every data-producing and -consuming system: core banking, DWH, reporting layers, risk systems, customer portals, and AI platforms.
Importance
BCBS 239 requires banks to aggregate risk data accurately. Poor governance directly causes regulatory breaches, wrong risk calculations, and loss of customer trust.
The Building Blocks of Data Governance
Governance is not a single tool — it is a combination of people, process, and technology.
📄 Data Catalog
An inventory of all data assets — databases, tables, columns, pipelines — with technical metadata automatically discovered and business metadata enriched by stewards. Think of it as the "library catalogue" for all enterprise data.
📚 Business Glossary
A controlled vocabulary defining business terms precisely: What is a "Non-Performing Loan"? What does "Balance" mean in each system? Glossaries eliminate ambiguity between IT and business teams.
🔗 Data Lineage
A visual map showing where data originates, how it transforms, and where it lands. Crucial for impact analysis ("if I change this column, what reports break?") and regulatory audit trails.
👥 Data Stewardship
Assigned owners and stewards per data domain. The Data Owner is accountable; the Data Steward manages day-to-day quality. Without clear ownership, nobody fixes data problems.
🔒 Access Control
RBAC (Role-Based) and ABAC (Attribute-Based) policies that control who can read, modify, or export sensitive data fields (PAN, Aadhaar, account balances). Enforced at catalog and compute layer.
📊 Data Quality Management
Automated profiling, quality rules, and alerting. Banks use this to catch nulls in critical fields, flag duplicate customer IDs, and enforce referential integrity across systems.
⚠ Why Governance Fails — and What it Costs
- BCBS 239 ViolationsInaccurate risk data aggregation leads to regulatory fines and mandatory remediation programs costing millions.
- Shadow DataUngoverned spreadsheets become "system of record" — calculations diverge, auditors can't trace numbers.
- AI HallucinationsAI agents trained on ungoverned data inherit every ambiguity and error, producing wrong answers with false confidence.
- Data BreachesWithout access policies, over-privileged users expose PII — violating GDPR and RBI data localisation norms.
- M&A Integration FailuresWhen two banks merge, ungoverned data makes customer deduplication and portfolio consolidation nearly impossible.
BCBS 239: The Governance Standard for Banks
Basel Committee on Banking Supervision Principle 239 sets the bar for risk data aggregation and reporting.
Principle 1 — Governance
Board-level accountability for data quality in risk reporting. The CRO must sign off on data lineage from source to risk report.
Principle 2 — Data Architecture & IT Infrastructure
A single integrated data taxonomy across all risk systems. No siloed definitions of "exposure" or "counterparty."
Principles 3–6 — Risk Data Aggregation Capabilities
Accuracy, integrity, completeness, timeliness, and adaptability. Banks must be able to aggregate group-wide risk positions within hours, not days.
Principles 7–11 — Risk Reporting Practices
Accurate, clear, comprehensive reports that can be produced on-demand (stress scenario). Includes frequency of reporting and distribution policies.
OpenMetadata as the Governance Layer
The demo runs OpenMetadata (Docker) and ingests all three databases — PostgreSQL, MySQL, SQLite.
It automatically discovers tables, columns, and row counts. Business glossary terms like portfolio_at_risk
and overdue_loan are defined and linked to columns. Lineage is tracked from raw tables → dbt staging → customer_360 mart.
Great Expectations plugs quality evidence directly back into the catalog.