Data Lineage
Documentation of data's origin, transformations, and movement through systems, letting insurers trace model inputs to source for audit and review.
FAQs
- Is data lineage documentation required by insurance regulators?
- No single regulation explicitly mandates a data lineage system, but the underlying documentation it provides is required by a range of obligations: actuarial standards require documentation of data sources and adjustments used in reserve and pricing analyses; market conduct examination responses require the ability to trace rating variables to source data; and NAIC model law guidance on AI governance expects insurers to be able to explain model inputs. Lineage tooling makes meeting these obligations feasible at scale.
- How long do we need to retain data lineage records?
- Retention requirements vary by the underlying regulatory obligation the lineage supports. Rate filing documentation may need to be retained for the life of the filing plus the applicable regulatory examination look-back period. Claims-related data lineage supporting reserve calculations typically aligns with the claims record retention period, which in many states is five to ten years or longer for long-tail lines.
- Can data lineage tools handle lineage across multiple cloud environments and legacy on-premises systems?
- Modern enterprise data catalog and lineage tools support heterogeneous environments through connectors to common data platforms, ETL tools, and database engines. Coverage of legacy systems may require custom connectors or manual documentation for pipelines that run outside standard platforms. A pragmatic approach prioritizes lineage for data feeding AI models and regulatory reports first.
Related Terms
Model Governance
Policies, controls, and oversight processes managing the full lifecycle of predictive and AI models from development through retirement.
AI Model Audit
A structured review of an AI or statistical model's design, training data, outputs, and deployment to verify accuracy, fairness, and regulatory compliance.
Insurance Data Lake
A centralized repository storing large volumes of raw structured and unstructured insurance data in native format for analytics, modeling, and reporting.
MLOps Insurance
Practices adapting machine learning operations to insurance: model versioning, deployment pipelines, monitoring, retraining, and regulatory documentation.
